8a. Free probability
We have seen in chapters 5-6 that the easiness property of [math]O_N,U_N[/math] and [math]O_N^+,U_N^+[/math] leads to a number of interesting probabilistic consequences, notably in what concerns the computation of the law of the main character [math]\chi[/math], in the [math]N\to\infty[/math] limit. Our purpose here will be two-fold. On one hand, we would like to have similar results for the various quantum groups introduced in chapter 7. And on the other hand, we would like to upgrade our results about characters [math]\chi[/math] into something more advanced.
In order to do this, we are in need of more probability knowledge. We have certainly met some classical and free probability in chapters 5-6, but the computations and results there were a bit ad-hoc, adapted to what we wanted to prove about [math]O_N,U_N[/math] and [math]O_N^+,U_N^+[/math]. And this kind of ad-hoc point of view will not do it, for what we want to do here.
In short, time for a crash course on probability. We will be following the book of Voiculescu, Dykema, Nica [1], and we will be quite brief, because at the core of classical and free probability are the Gaussian laws, real and complex, and the semicircular and circular laws, that we already know about, from chapters 5-6. Thus, our job will be basically that of putting what we know in a more conceptual framework.
Let us first talk about classical probability. The starting point here is:
Let [math]X[/math] be a probability space.
- The real functions [math]f\in L^\infty(X)[/math] are called random variables.
- The moments of such a variable are the numbers [math]M_k(f)=\mathbb E(f^k)[/math].
- The law of such a variable is the measure given by [math]M_k(f)=\int_\mathbb Rx^kd\mu_f(x)[/math].
Here the fact that [math]\mu_f[/math] exists indeed is not trivial. By linearity, we would like to have a real probability measure making hold the following formula, for any [math]P\in\mathbb R[X][/math]:
By using a continuity argument, it is enough to have this for the characteristic functions [math]\chi_I[/math] of the measurable sets [math]I\subset\mathbb R[/math]. Thus, we would like to have [math]\mu_f[/math] such that:
But this latter formula can serve as a definition for [math]\mu_f[/math], so we are done. Next in line, we need to talk about independence. Once again with the idea of doing things a bit abstractly, and most adapted to what we want to do here, the definition is as follows:
Two variables [math]f,g\in L^\infty(X)[/math] are called independent when
Again, this definition hides some non-trivial things. Indeed, by linearity, we would like to have a formula as follows, valid for any polynomials [math]P,Q\in\mathbb R[X][/math]:
By continuity, it is enough to have this for characteristic functions of type [math]\chi_I,\chi_J[/math], with [math]I,J\subset\mathbb R[/math]. Thus, we are led to the usual definition of independence, namely:
Here is now our first result, providing tools for the study of the independence:
Assume that [math]f,g\in L^\infty(X)[/math] are independent.
- We have [math]\mu_{f+g}=\mu_f*\mu_g[/math], where [math]*[/math] is the convolution of measures.
- We have [math]F_{f+g}=F_fF_g[/math], where [math]F_f(x)=\mathbb E(e^{ixf})[/math] is the Fourier transform.
This is something very standard, the idea being as follows:
(1) We have the following computation, using the independence of [math]f,g[/math]:
On the other hand, by using the Fubini theorem, we have as well:
Thus the measures [math]\mu_{f+g}[/math] and [math]\mu_f*\mu_g[/math] have the same moments, and so coincide.
(2) We have indeed the following computation, using (1) and Fubini:
Thus, we are led to the conclusion in the statement.
Let us discuss now the normal distributions. We have here:
The normal law of parameter [math]t \gt 0[/math] is the following measure:
As a first remark, the above law has indeed mass 1, as it should. This follows indeed from the Gauss formula, which gives, with [math]x=y/\sqrt{2t}[/math]:
Generally speaking, the normal laws appear as bit everywhere, in real life. The reasons behind this phenomenon come from the Central Limit Theorem (CLT), that we will explain in a moment, after developing the needed general theory. We first have:
We have the following formula, for any [math]t \gt 0[/math]:
The Fourier transform formula can be established as follows:
As for the last assertion, this follows from the linearization result from Theorem 8.3 (2) above, because [math]\log F_{g_t}[/math] is linear in [math]t[/math].
We are now ready to state and prove the CLT, as follows:
Given random variables [math]f_1,f_2,f_3,\ldots\in L^\infty(X)[/math] which are i.i.d., centered, and with variance [math]t \gt 0[/math], we have, with [math]n\to\infty[/math], in moments,
We have the following formula for [math]F_f(x)=\mathbb E(e^{ixf})[/math], in terms of moments:
Thus, the Fourier transform of the variable in the statement is:
But this latter function being the Fourier transform of [math]g_t[/math], we obtain the result.
Let us record as well the complex version of the CLT. This is as follows:
Given variables [math]f_1,f_2,f_3,\ldots\in L^\infty(X)[/math] whose real and imaginary parts are i.i.d., centered, and with variance [math]t \gt 0[/math], we have, with [math]n\to\infty[/math],
This is clear from Theorem 8.6, by taking real and imaginary parts.
In the noncommutative setting now, the starting definition is as follows:
Let [math]A[/math] be a [math]C^*[/math]-algebra, given with a trace [math]tr[/math].
- The elements [math]a\in A[/math] are called random variables.
- The moments of such a variable are the numbers [math]M_k(a)=tr(a^k)[/math].
- The law of such a variable is the functional [math]\mu:P\to tr(P(a))[/math].
Here [math]k=\circ\bullet\bullet\circ\ldots[/math] is as usual a colored integer, and the powers [math]a^k[/math] are defined by multiplicativity and the usual formulae, namely:
As for the polynomial [math]P[/math], this is a noncommuting [math]*[/math]-polynomial in one variable:
Observe that the law is uniquely determined by the moments, because:
Generally speaking, the above definition is something quite abstract, but there is no other way of doing things, at least at this level of generality. We have indeed:
Given a [math]C^*[/math]-algebra with a faithful trace [math](A,tr)[/math], any normal variable,
This is something that we know from chapter 6, coming from the Gelfand theorem, which gives [math] \lt a \gt =C(\sigma(a))[/math], and the Riesz theorem. As for the last assertion, we know this too from chapter 6, coming via [math]tr(aa^*aa^*) \gt tr(aaa^*a^*)[/math] for [math]aa^*\neq a^*a[/math].
Let us discuss now the independence, and its noncommutative versions, in the above setting. As a starting point here, we have the following notion:
Two subalgebras [math]B,C\subset A[/math] are called independent when the following condition is satisfied, for any [math]b\in B[/math] and [math]c\in C[/math]:
Observe that the above two conditions are indeed equivalent. In one sense this is clear, and in the other sense, with [math]a'=a-tr(a)[/math], this follows from:
The other remark is that the above notion generalizes indeed the usual notion of independence, from the classical case, the result here being as follows:
Given two compact measured spaces [math]Y,Z[/math], the algebras
We have two assertions here, the idea being as follows:
(1) First of all, given two arbitrary compact spaces [math]Y,Z[/math], we have embeddings of algebras as in the statement, defined by the following formulae:
In the measured space case now, the Fubini theorems tells us that:
Thus, the algebras [math]C(Y),C(Z)[/math] are independent in the sense of Definition 8.3.
(2) Conversely now, assume that [math]B,C\subset A[/math] are independent, with [math]A[/math] being commutative. Let us write our algebras as follows, with [math]X,Y,Z[/math] being certain compact spaces:
In this picture, the inclusions [math]B,C\subset A[/math] must come from quotient maps, as follows:
Regarding now the independence condition from Definition 8.3, in the above picture, this tells us that the folowing equality must happen:
Thus we are in a Fubini type situation, and we obtain from this [math]Y\times Z\subset X[/math]. Thus, the independence of [math]B,C\subset A[/math] appears as in (1) above.
It is possible to develop some theory here, but this is ultimately not very interesting. As a much more interesting notion now, we have Voiculescu's freeness [1]:
Two subalgebras [math]B,C\subset A[/math] are called free when the following condition is satisfied, for any [math]b_i\in B[/math] and [math]c_i\in C[/math]:
As a first observation, of theoretical nature, there is actually a certain lack of symmetry between Definition 8.10 and Definition 8.12, because in contrast to the former, the latter does not include an explicit formula for the quantities of the following type:
However, this is not an issue, and is simply due to the fact that the formula in the free case is something more complicated, the result being as follows:
Assuming that [math]B,C\subset A[/math] are free, the restriction of [math]tr[/math] to [math] \lt B,C \gt [/math] can be computed in terms of the restrictions of [math]tr[/math] to [math]B,C[/math]. To be more precise,
This is something that we know from chapter 6, which is based on a computation which is similar to that made after Definition 8.10.
Let us discuss now some models for independence and freeness. We first have:
Given two algebras [math](B,tr)[/math] and [math](C,tr)[/math], the following hold:
- [math]B,C[/math] are independent inside their tensor product [math]B\otimes C[/math], endowed with its canonical tensor product trace, given on basic tensors by [math]tr(b\otimes c)=tr(b)tr(c)[/math].
- [math]B,C[/math] are free inside their free product [math]B*C[/math], endowed with its canonical free product trace, given by the formulae in Proposition 8.13.
Both the assertions are clear from definitions, as follows:
(1) This is clear with either of the definitions of the independence, from Definition 8.10 above, because we have by construction of the trace:
(2) This is clear from definitions, the only point being that of showing that the notion of freeness, or the recurrence formulae in Proposition 8.13, can be used in order to construct a canonical free product trace, on the free product of the two algebras involved:
But this can be checked for instance by using a GNS construction. Indeed, consider the GNS constructions for the algebras [math](B,tr)[/math] and [math](C,tr)[/math]:
By taking the free product of these representations, we obtain a representation as follows, with the [math]*[/math] symbol on the right being a free product of pointed Hilbert spaces:
Now by composing with the linear form [math]T\to \lt T\xi,\xi \gt [/math], where [math]\xi=1_B=1_C[/math] is the common distinguished vector of [math]l^2(B)[/math] and [math]l^2(C)[/math], we obtain a linear form, as follows:
It is routine then to check that [math]tr[/math] is indeed a trace, and this is the “canonical free product trace” from the statement. Then, an elementary computation shows that [math]B,C[/math] are indeed free inside [math]B*C[/math], with respect to this trace, and this finishes the proof.
As a concrete application of the above results, still following [1], we have:
We have a free convolution operation [math]\boxplus[/math] for the distributions
We have several verifications to be performed here, as follows:
(1) We first have to check that given two variables [math]b,c[/math] which live respectively in certain [math]C^*[/math]-algebras [math]B,C[/math], we can recover them inside some [math]C^*[/math]-algebra [math]A[/math], with exactly the same distributions [math]\mu_b,\mu_c[/math], as to be able to sum them and then talk about [math]\mu_{b+c}[/math]. But this comes from Theorem 8.14, because we can set [math]A=B*C[/math], as explained there.
(2) The other verification which is needed is that of the fact that if [math]b,c[/math] are free, then the distribution [math]\mu_{b+c}[/math] depends only on the distributions [math]\mu_b,\mu_c[/math]. But for this purpose, we can use the general formula from Proposition 8.13, namely:
Here [math]P[/math] is certain polynomial, depending on the length of [math]b_1c_1b_2c_2\ldots[/math], having as variables the traces of products [math]b_{i_1}b_{i_2}\ldots[/math] and [math]c_{j_1}c_{j_2}\ldots[/math], with [math]i_1 \lt i_2 \lt \ldots[/math] and [math]j_1 \lt j_2 \lt \ldots[/math]
Now by plugging in arbitrary powers of [math]b,c[/math] as variables [math]b_i,c_j[/math], we obtain a family of formulae of the following type, with [math]Q[/math] being certain polynomials:
Thus the moments of [math]b+c[/math] depend only on the moments of [math]b,c[/math], with of course colored exponents in all this, according to our moment conventions, and this gives the result.
(3) Finally, in what regards the last assertion, regarding the real measures, this is clear from the fact that if [math]b,c[/math] are self-adjoint, then so is their sum [math]b+c[/math].
We would like now to have a linearization result for [math]\boxplus[/math], in the spirit of the previous result for [math]*[/math]. We will do this slowly, in several steps. As a first observation, both independence and freeness are nicely modelled inside group algebras, as follows:
We have the following results, valid for group algebras:
- [math]C^*(\Gamma),C^*(\Lambda)[/math] are independent inside [math]C^*(\Gamma\times\Lambda)[/math].
- [math]C^*(\Gamma),C^*(\Lambda)[/math] are free inside [math]C^*(\Gamma*\Lambda)[/math].
In order to prove these results, we have two possible methods:
(1) We can use here the general results in Theorem 8.14 above, along with the following two isomorphisms, which are both standard:
(2) We can prove this directly as well, by using the fact that each group algebra is spanned by the corresponding group elements. Indeed, it is enough to check the independence and freeness formulae on group elements, which is in turn trivial.
Regarding now the linearization problem for [math]\boxplus[/math], the situation here is quite tricky. We need good models for the pairs of free random variables [math](b,c)[/math], and the problem is that the models that we have will basically lead us into the combinatorics from Proposition 8.13 and its proof, that cannot be solved with bare hands, and that we want to avoid.
The idea will be that of temporarily lifting the self-adjointness assumption on our variables [math]b,c[/math], and looking instead for arbitrary random variables [math]\beta,\gamma[/math], not necessarily self-adjoint, modelling in integer moments our given laws [math]\mu,\nu\in\mathcal P(\mathbb R)[/math], as follows:
To be more precise, assuming that [math]\beta,\gamma[/math] are indeed not self-adjoint, the above formulae are not the general formulae for [math]\beta,\gamma[/math], simply because these latter formulae involve colored integers [math]k=\circ\bullet\bullet\circ\ldots[/math] as exponents. Thus, in the context of the above formulae, [math]\mu,\nu[/math] are not the distributions of [math]\beta,\gamma[/math], but just some “pieces” of these distributions.
Now with this tricky idea in mind, due to Voiculescu [1], the solution to our law modelling problem comes in a quite straightforward way, involving the good old Hilbert space [math]H=l^2(\mathbb N)[/math] and the good old shift operator [math]S\in B(H)[/math], as follows:
Consider the shift operator on the space [math]H=l^2(\mathbb N)[/math], given by:
The adjoint of the shift is given by the following formula:
Consider now a variable as in the statement, namely:
We have then [math]tr(T)=a_0[/math], then [math]tr(T^2)[/math] will involve [math]a_1[/math], then [math]tr(T^3)[/math] will involve [math]a_2[/math], and so on. Thus, we are led to a certain recurrence, that we will not attempt to solve now, with bare hands, but which definitely gives the conclusion in the statement.
Before getting further, let us point out the following fundamental fact:
In the context of the above correspondence, the variable
This is something that we know from chapter 6, the idea being that the combinatorics of [math](S+S^*)^k[/math] leads us into paths on [math]\mathbb N[/math], and to the Catalan numbers.
Getting back now to our linearization program for [math]\boxplus[/math], the next step is that of taking a free product of the model found in Theorem 8.17 with itself. We have here:
We can define the algebra of creation operators
- With [math]H=\mathbb C[/math] we recover the shift algebra [math]A= \lt S \gt [/math] on [math]H=l^2(\mathbb N)[/math].
- With [math]H=\mathbb C^2[/math], we obtain the algebra [math]A= \lt S_1,S_2 \gt [/math] on [math]H=l^2(\mathbb N*\mathbb N)[/math].
We can talk indeed about the algebra [math]A(H)[/math] of creation operators on the free Fock space [math]F(H)[/math] associated to a real Hilbert space [math]H[/math], with the remark that, in terms of the abstract semigroup notions from chapter 6 above, we have:
Thus, we are led to the conclusions in the statement.
With the above notions in hand, we have the following key freeness result:
Given a real Hilbert space [math]H[/math], and two orthogonal vectors [math]x\perp y[/math], the corresponding creation operators [math]S_x[/math] and [math]S_y[/math] are free with respect to
This is something that we know from chapter 6, coming from the formula [math]S_x^*S_y= \lt x,y \gt id[/math], valid for any two vectors [math]x,y\in H[/math], which itself is elementary.
With this technology in hand, let us go back to our linearization program for [math]\boxplus[/math]. We have the following key result, further building on Proposition 8.20:
Given two polynomials [math]f,g\in\mathbb C[X][/math], consider the variables
Again, this is something that we know from chapter 6, the idea being that this comes from Proposition 8.20, by using a [math]45^\circ[/math] rotation trick.
We can now solve the linearization problem. Following Voiculescu [1], we have:
Given a real probability measure [math]\mu[/math], define its [math]R[/math]-transform as follows:
This can be done by using the above results, in several steps, as follows:
(1) According to Theorem 8.21, the operation [math]\mu\to f[/math] from Theorem 8.10 linearizes the free convolution operation [math]\boxplus[/math]. We are therefore left with a computation inside [math]C^*(\mathbb N)[/math]. To be more precise, consider a variable as in Theorem 8.21:
In order to establish the result, we must prove that the [math]R[/math]-transform of [math]X[/math], constructed according to the procedure in the statement, is the function [math]f[/math] itself.
(2) In order to do so, fix [math]|z| \lt 1[/math] in the complex plane, and let us set:
The shift and its adjoint act then as follows, on this vector:
It follows that the adjoint of our operator [math]X[/math] acts as follows on this vector:
Now observe that this formula can be written as follows:
The point now is that when [math]|z|[/math] is small, the operator appearing on the right is invertible. Thus, we can rewrite this formula as follows:
Now by applying the trace, we are led to the following formula:
(3) Let us apply now the complex function procedure in the statement to the real probability measure [math]\mu[/math] modelled by [math]X[/math]. The Cauchy transform [math]G_\mu[/math] is given by:
Now observe that, with the choice [math]\xi=z^{-1}+f(z)[/math] for our complex variable, the trace formula found in (2) above tells us precisely that we have:
Thus, we have [math]R_\mu(z)=f(z)[/math], which finishes the proof, as explained in step (1).
With the above linearization technology in hand, we can now establish the following free analogue of the CLT, also due to Voiculescu [1]:
Given self-adjoint variables [math]x_1,x_2,x_3,\ldots[/math] which are f.i.d., centered, with variance [math]t \gt 0[/math], we have, with [math]n\to\infty[/math], in moments,
We follow the same idea as in the proof of the CLT:
(1) At [math]t=1[/math], the [math]R[/math]-transform of the variable in the statement on the left can be computed by using the linearization property from Theorem 8.22, and is given by:
(2) Regarding now the right term, also at [math]t=1[/math], our claim is that the [math]R[/math]-transform of the Wigner semicircle law [math]\gamma_1[/math] is given by the following formula:
But this follows via some calculus, or directly from the following formula, coming from Proposition 8.18, and from the technical details of the [math]R[/math]-transform:
Thus, the laws in the statement have the same [math]R[/math]-transforms, and so they are equal.
(4) Summarizing, we have proved the free CLT at [math]t=1[/math]. The passage to the general case, [math]t \gt 0[/math], is routine, by some standard dilation computations.
Similarly, in the complex case, we have the following result, also from [1]:
Given variables [math]x_1,x_2,x_3,\ldots,[/math] whose real and imaginary parts are f.i.d., centered, and with variance [math]t \gt 0[/math], we have, with [math]n\to\infty[/math],
This is clear from Theorem 8.23, by taking real and imaginary parts.
There are of course many other things that can be said about [math]g_t,\gamma_t,G_t,\Gamma_t[/math], but for the moment, this is all we need. We will be back later to these laws, with more details.
General references
Banica, Teo (2024). "Introduction to quantum groups". arXiv:1909.08152 [math.CO].