Normal laws
1a. Probability theory
Generally speaking, probability theory is best learned by flipping coins and throwing dice. At a more advanced level, which is playing cards, we have:
The probabilities at poker are as follows:
- One pair: [math]0.533[/math].
- Two pairs: [math]0.120[/math].
- Three of a kind: [math]0.053[/math].
- Full house: [math]0.006[/math].
- Straight: [math]0.005[/math].
- Four of a kind: [math]0.001[/math].
- Flush: [math]0.000[/math].
- Straight flush: [math]0.000[/math].
Let us consider indeed our deck of 32 cards, [math]7,8,9,10,J,Q,K,A[/math]. The total number of possibilities for a poker hand is:
(1) For having a pair, the number of possibilities is:
Thus, the probability of having a pair is:
(2) For having two pairs, the number of possibilities is:
Thus, the probability of having two pairs is:
(3) For having three of a kind, the number of possibilities is:
Thus, the probability of having three of a kind is:
(4) For having full house, the number of possibilities is:
Thus, the probability of having full house is:
(5) For having a straight, the number of possibilities is:
Thus, the probability of having a straight is:
(6) For having four of a kind, the number of possibilities is:
Thus, the probability of having four of a kind is:
(7) For having a flush, the number of possibilities is:
Thus, the probability of having a flush is:
(8) For having a straight flush, the number of possibilities is:
Thus, the probability of having a straight flush is:
Thus, we have obtained the numbers in the statement.
Summarizing, probability is basically about binomials and factorials, and ultimately about numbers. We will see later that, in connection with more advanced questions, of continuous nature, some standard calculus comes into play as well.
Let us discuss now the general theory. The fundamental result in probability is the Central Limit Theorem (CLT), and our first task will be that of explaining this. With the idea in mind of doing things a bit abstractly, our starting point will be:
Let [math]X[/math] be a probability space, that is, a space with a probability measure, and with the corresponding integration denoted [math]E[/math], and called expectation.
- The random variables are the real functions [math]f\in L^\infty(X)[/math].
- The moments of such a variable are the numbers [math]M_k(f)=E(f^k)[/math].
- The law of such a variable is the measure given by [math]M_k(f)=\int_\mathbb Rx^kd\mu_f(x)[/math].
Here the fact that [math]\mu_f[/math] exists indeed is well-known. By linearity, we would like to have a real probability measure making hold the following formula, for any [math]P\in\mathbb R[X][/math]:
By using a standard continuity argument, it is enough to have this formula for the characteristic functions [math]\chi_I[/math] of the measurable sets of real numbers [math]I\subset\mathbb R[/math]:
But this latter formula, which reads [math]P(f\in I)=\mu_f(I)[/math], can serve as a definition for [math]\mu_f[/math], and we are done. Alternatively, assuming some familiarity with measure theory, [math]\mu_f[/math] is the push-forward of the probability measure on [math]X[/math], via the function [math]f:X\to\mathbb R[/math].
Next in line, we need to talk about independence. We can do this as follows:
Two variables [math]f,g\in L^\infty(X)[/math] are called independent when
Again, this definition hides some non-trivial things. Indeed, by linearity, we would like to have a formula as follows, valid for any polynomials [math]P,Q\in\mathbb R[X][/math]:
By using a continuity argument, it is enough to have this formula for characteristic functions [math]\chi_I,\chi_J[/math] of the measurable sets of real numbers [math]I,J\subset\mathbb R[/math]:
Thus, we are led to the usual definition of independence, namely:
All this might seem a bit abstract, but in practice, the idea is of course that [math]f,g[/math] must be independent, in an intuitive, real-life sense. As a first result now, we have:
Assuming that [math]f,g\in L^\infty(X)[/math] are independent, we have
We have the following computation, using the independence of [math]f,g[/math]:
On the other hand, by using the Fubini theorem, we have as well:
Thus [math]\mu_{f+g}[/math] and [math]\mu_f*\mu_g[/math] have the same moments, so they coincide, as desired.
Here is now a second result on independence, which is something more advanced:
Assuming that [math]f,g\in L^\infty(X)[/math] are independent, we have
We have the following computation, using Proposition 1.4 and Fubini:
Thus, we are led to the conclusion in the statement.
This was for the foundations of probability theory, quickly explained. For further reading, a classical book is Feller [1]. A nice, more modern book is Durrett [2].
1b. Central limits
The main result in classical probability is the Central Limit Theorem (CLT), that we will explain now. Let us first discuss the normal distributions, that we will see later to appear as limiting laws in the CLT. We will need the following standard result:
We have the following formula,
Let [math]I[/math] be the integral in the statement. By using polar coordinates, namely [math]x=r\cos t[/math], [math]y=r\sin t[/math], with the corresponding Jacobian being [math]r[/math], we have:
Thus, we are led to the formula in the statement.
We can now introduce the normal distributions, as follows:
The normal law of parameter [math]1[/math] is the following measure:
The above laws are usually denoted [math]\mathcal N(0,1)[/math] and [math]\mathcal N(0,t)[/math], but since we will be doing in this book all kinds of probability, we will use simplified notations for all our measures. Let us mention as well that the normal laws traditionally have 2 parameters, the mean and the variance, but here we will not need the mean, all our theory using centered laws. Finally, observe that the above laws have indeed mass 1, as they should, due to:
Generally speaking, the normal laws appear as bit everywhere, in real life. The reasons for this come from the Central Limit Theorem (CLT), that we will explain in a moment, after developing some more general theory. As a first result, we have:
We have the variance formula
The first moment is 0, because our normal law [math]g_t[/math] is centered. As for the second moment, this can be computed as follows:
We conclude from this that the variance is [math]V=M_2=t[/math].
Here is another result, which is widely useful in practice:
We have the following formula, valid for any [math]t \gt 0[/math]:
The Fourier transform formula can be established as follows:
As for [math]g_s*g_t=g_{s+t}[/math], this follows via Theorem 1.5, [math]\log F_{g_t}[/math] being linear in [math]t[/math].
We are now ready to state and prove the CLT, as follows:
Given real variables [math]f_1,f_2,f_3,\ldots\in L^\infty(X)[/math] which are i.i.d., centered, and with common variance [math]t \gt 0[/math], we have
In terms of moments, the Fourier transform is given by:
Thus, the Fourier transform of the variable in the statement is:
But this function being the Fourier transform of [math]g_t[/math], we obtain the result.
Let us discuss now some further properties of the normal law. We first have:
The even moments of the normal law are the numbers
We have the following computation, valid for any integer [math]k\in\mathbb N[/math]:
Now recall from the proof of Proposition 1.8 that we have [math]M_0=1[/math], [math]M_1=0[/math]. Thus by recurrence, we are led to the formula in the statement.
We have the following alternative formulation of the above result:
The moments of the normal law are the numbers
Let us count the pairings of [math]\{1,\ldots,k\}[/math]. In order to have such a pairing, we must pair [math]1[/math] with one of the numbers [math]2,\ldots,k[/math], and then use a pairing of the remaining [math]k-2[/math] numbers. Thus, we have the following recurrence formula:
As for the initial data, this is [math]P_1=0[/math], [math]P_2=1[/math]. Thus, we are led to the result.
We are not done yet, and here is one more improvement of the above:
The moments of the normal law are the numbers
This follows indeed from Proposition 1.12, because the number of blocks of a pairing of [math]\{1,\ldots,k\}[/math] is trivially [math]k/2[/math], independently of the pairing.
We will see later in this book that many other interesting probability distributions are subject to similar formulae regarding their moments, involving partitions.
1c. Spherical integrals
In a purely mathematical context, the simplest way of recovering the normal laws is by looking at the coordinates over the real spheres [math]S^{N-1}_\mathbb R[/math], in the [math]N\to\infty[/math] limit. To start with, at [math]N=2[/math] the sphere is the unit circle [math]\mathbb T[/math], and with [math]z=e^{it}[/math] the coordinates are [math]\cos t,\sin t[/math]. Let us first integrate powers of these coordinates. We have here:
We have the following formulae,
Let us call [math]I_k[/math] the integral on the left in the statement. In order to compute it, we use partial integration. We have the following formula:
By integrating between [math]0[/math] and [math]\pi/2[/math], we obtain the following formula:
Thus we can compute [math]I_k[/math] by recurrence, and we obtain in this way:
The initial data being [math]I_0=\pi/2[/math] and [math]I_1=1[/math], we obtain the result. As for the second formula, this follows from the first one, with the change of variables [math]t=\pi/2-s[/math].
More generally now, we have the following result:
We have the following formula,
Let us call [math]I_{rs}[/math] the integral in the statement. In order to do the partial integration, observe that we have the following formula:
By integrating between [math]0[/math] and [math]\pi/2[/math], we obtain, for [math]r,s \gt 0[/math]:
Thus, we can compute [math]I_{rs}[/math] by recurrence. When [math]s[/math] is even we have:
But the last term comes from Proposition 1.14, and we obtain the result:
Observe that this gives the result for [math]r[/math] even as well, by symmetry. In the remaining case now, where both the exponents [math]r,s[/math] are odd, we can use once again the formula [math]rI_{r-1,s+1}=sI_{r+1,s-1}[/math] found above, and the recurrence goes as follows:
In order to compute the last term, observe that we have:
Thus, we obtain the formula in the statement, the exponent of [math]\pi/2[/math] appearing there being [math]\varepsilon(r)\varepsilon(s)=0\cdot 0=0[/math] in the present case, and this finishes the proof.
In order to deal now with the higher spheres, we will use spherical coordinates:
We have spherical coordinates in [math]N[/math] dimensions,
The fact that we have indeed spherical coordinates is clear. Regarding now the Jacobian, by developing over the last column, we have:
Thus, we obtain the formula in the statement, by recurrence.
As a first application, we can compute the volume of the sphere:
The volume of the unit sphere in [math]\mathbb R^N[/math] is given by
If we denote by [math]Q[/math] the positive part of the sphere, obtained by cutting the sphere in [math]2^N[/math] parts, we have, by using Theorems 1.15 and 1.16 and Fubini:
Here we have used the following formula for computing the exponent of [math]\pi/2[/math], where [math]\varepsilon(r)=1[/math] if [math]r[/math] is even and [math]\varepsilon(r)=0[/math] if [math]r[/math] is odd, as in Theorem 1.15:
Thus, we are led to the conclusion in the statement.
Let us discuss now the computation of the arbitrary polynomial integrals, over the spheres of arbitrary dimension. The result here is as follows:
The spherical integral of [math]x_{i_1}\ldots x_{i_r}[/math] vanishes, unless each index [math]a\in\{1,\ldots,N\}[/math] appears an even number of times in the sequence [math]i_1,\ldots,i_r[/math]. We have
In what concerns the first assertion, regarding vanishing when some multiplicity [math]k_a[/math] is odd, this follows via the change of variables [math]x_a\to-x_a[/math]. Regarding now the formula in the statement, assume that we are in the case [math]k_a\in 2\mathbb N[/math], for any [math]a\in\{1,\ldots,N\}[/math]. The integral in the statement can be written in spherical coordinates, as follows:
In this formula [math]V[/math] is the volume of the sphere, [math]J[/math] is the Jacobian, and the [math]2^N[/math] factor comes from the restriction to the [math]1/2^N[/math] part of the sphere where all the coordinates are positive. According to the formula in Theorem 1.17, the normalization constant is:
As for the unnormalized integral, this is given by:
By rearranging the terms, we obtain:
Now by using the formula in Theorem 1.15, this gives:
Now observe that the various double factorials multiply up to quantity in the statement, modulo a [math](N-1)!![/math] factor, and that the [math]\pi/2[/math] factors multiply up to:
Thus by multiplying by the normalization constant, we obtain the result.
We can now recover the normal laws, geometrically, as follows:
The moments of the hyperspherical variables are
The moment formula in the statement follows from Theorem 1.18. As a consequence, with [math]N\to\infty[/math] we have the following estimate:
Thus, the rescaled variables [math]\sqrt{N}x_i[/math] become normal with [math]N\to\infty[/math], as claimed. As for the proof of the asymptotic independence, this is standard too, once again by using the formula in Theorem 1.18. Indeed, the joint moments of [math]x_1,\ldots,x_N[/math] are given by:
By rescaling, the joint moments of the variables [math]y_i=\sqrt{N}x_i[/math] are given by:
Thus, we have multiplicativity, and so independence with [math]N\to\infty[/math], as claimed.
As a last result about the normal laws, we can recover these as well in connection with rotation groups. Indeed, we have the following reformulation of Theorem 1.19:
We have the integration formula
We use the basic fact that the rotations [math]U\in O_N[/math] act on the points of the real sphere [math]z\in S^{N-1}_\mathbb R[/math], with the stabilizer of [math]z=(1,0,\ldots,0)[/math] being the subgroup [math]O_{N-1}\subset O_N[/math]. In algebraic terms, this gives an identification as follows:
In functional analytic terms, this result provides us with an embedding as follows, for any [math]i[/math], which makes correspond the respective integration functionals:
With this identification made, the result follows from Theorem 1.19.
We will see later, following [3], [4], that the relation between the orthogonal group [math]O_N[/math] and the normal laws goes well beyond Theorem 1.20. And we will see as well, following [5], [6] and related papers, that there are also “free versions” of all this.
1d. Complex variables
We have seen so far a number of interesting results regarding the normal laws, and their geometric interpretation. As a last topic for this chapter, let us discuss now the complex analogues of all this. To start with, we have the following definition:
The complex Gaussian law of parameter [math]t \gt 0[/math] is
As in the real case, these measures form convolution semigroups:
The complex Gaussian laws have the property
This follows indeed from the real result, namely [math]g_s*g_t=g_{s+t}[/math], established in Theorem 1.9, simply by taking real and imaginary parts.
We have as well the following complex analogue of the CLT:
Given complex variables [math]f_1,f_2,f_3,\ldots\in L^\infty(X)[/math] which are i.i.d., centered, and with common variance [math]t \gt 0[/math], we have
This follows indeed from the real CLT, established in Theorem 1.10, simply by taking the real and imaginary parts of all variables involved.
Regarding now the moments, the situation is more complicated than in the real case, because in order to have good results, we have to deal with both the complex variables, and their conjugates. Let us formulate the following definition:
The moments a complex variable [math]f\in L^\infty(X)[/math] are the numbers
Observe that, since [math]f,\bar{f}[/math] commute, we can permute terms, and restrict the attention to exponents of type [math]k=\ldots\circ\circ\circ\bullet\bullet\bullet\bullet\ldots\,[/math], if we want to. However, our result about the complex Gaussian laws, and other complex laws, later on, will actually look better without doing is, so we will use Definition 1.24 as stated. We first have:
The moments of the complex normal law are given by
We must compute the moments, with respect to colored integer exponents [math]k=\circ\bullet\bullet\circ\ldots[/math]\,, of the variable from Definition 1.21, namely:
We can assume that we are in the case [math]t=1[/math], and the proof here goes as follows:
(1) As a first observation, in the case where our exponent [math]k=\circ\bullet\bullet\circ\ldots[/math] is not uniform, a standard rotation argument shows that the corresponding moment of [math]f[/math] vanishes. To be more precise, the variable [math]f'=wf[/math] is complex Gaussian too, for any complex number [math]w\in\mathbb T[/math], and from [math]M_k(f)=M_k(f')[/math] we obtain [math]M_k(f)=0[/math], in this case.
(2) In the uniform case now, where the exponent [math]k=\circ\bullet\bullet\circ\ldots[/math] consists of [math]p[/math] copies of [math]\circ[/math] and [math]p[/math] copies of [math]\bullet[/math]\,, the corresponding moment can be computed as follows:
(3) In order to finish now the computation, let us recall that we have the following formula, coming from the generalized binomial formula, or from the Taylor formula:
By taking the square of this series, we obtain the following formula:
Now by looking at the coefficient of [math]t^p[/math] on both sides, we conclude that the sum on the right equals [math]4^p[/math]. Thus, we can finish the moment computation in (2), as follows:
We are therefore led to the conclusion in the statement.
As before with the real Gaussian laws, a better-looking statement is in terms of partitions. Given a colored integer [math]k=\circ\bullet\bullet\circ\ldots\,[/math], we say that a pairing [math]\pi\in P_2(k)[/math] is matching when it pairs [math]\circ-\bullet[/math] symbols. With this convention, we have the following result:
The moments of the complex normal law are the numbers
This is a reformulation of Theorem 1.25. Indeed, we can assume that we are in the case [math]t=1[/math], and here we know from Theorem 1.25 that the moments are:
On the other hand, the numbers [math]|\mathcal P_2(k)|[/math] are given by exactly the same formula. Indeed, in order to have a matching pairing of [math]k[/math], our exponent [math]k=\circ\bullet\bullet\circ\ldots[/math] must be uniform, consisting of [math]p[/math] copies of [math]\circ[/math] and [math]p[/math] copies of [math]\bullet[/math], with [math]p=|k|/2[/math]. But then the matching pairings of [math]k[/math] correspond to the permutations of the [math]\bullet[/math] symbols, as to be matched with [math]\circ[/math] symbols, and so we have [math]p![/math] such pairings. Thus, we have the same formula as for the moments of [math]f[/math], and we are led to the conclusion in the statement.
In practice, we also need to know how to compute joint moments of independent normal variables. We have here the following result, to be heavily used later on:
Given independent variables [math]f_i[/math], each following the complex normal law [math]G_t[/math], with [math]t \gt 0[/math] being a fixed parameter, we have the formula
This is something well-known, which can be proved as follows:
(1) Let us first discuss the case where we have a single variable [math]f[/math], which amounts in taking [math]f_i=f[/math] for any [math]i[/math] in the formula in the statement. What we have to compute here are the moments of [math]f[/math], with respect to colored integer exponents [math]k=\circ\bullet\bullet\circ\ldots\,[/math], and the formula in the statement tells us that these moments must be:
But this is the formula in Theorem 1.26, so we are done with this case.
(2) In general now, when expanding the product [math]f_{i_1}^{k_1}\ldots f_{i_s}^{k_s}[/math] and rearranging the terms, we are left with doing a number of computations as in (1), and then making the product of the expectations that we found. But this amounts in counting the partitions in the statement, with the condition [math]\pi\leq\ker i[/math] there standing for the fact that we are doing the various type (1) computations independently, and then making the product.
The above statement is one of the possible formulations of the Wick formula, and there are in fact many more formulations, which are all useful. We will be back to this in chapter 6 below, when discussing applications of the Wick formula. Getting back now to geometric aspects, in the spirit for what we did in the real case, we have:
We have the following integration formula over the complex sphere [math]S^{N-1}_\mathbb C\subset\mathbb C^N[/math], with respect to the normalized uniform measure,
Consider an arbitrary polynomial integral over [math]S^{N-1}_\mathbb C[/math], written as follows:
By using transformations of type [math]p\to\lambda p[/math] with [math]|\lambda|=1[/math], we see that this integral [math]I[/math] vanishes, unless each [math]z_a[/math] appears as many times as [math]\bar{z}_a[/math] does, and this gives the last assertion. So, assume now that we are in the non-vanishing case. Then the [math]k_a[/math] copies of [math]z_a[/math] and the [math]k_a[/math] copies of [math]\bar{z}_a[/math] produce by multiplication a factor [math]|z_a|^{2k_a}[/math], so we have:
Now by using the standard identification [math]S^{N-1}_\mathbb C\simeq S^{2N-1}_\mathbb R[/math], we obtain:
By using the formula in Theorem 1.18, we obtain:
Now observe that can rewrite this quantity in the following way:
Here we have used the following well-known identity, whose proof is standard:
Thus, we obtain the formula in the statement.
Regarding now the hyperspherical variables, investigated in the above in the real case, we have similar results in the complex case, as follows:
The rescalings [math]\sqrt{N}z_i[/math] of the unit complex sphere coordinates
We have several assertions to be proved, the idea being as follows:
(1) According to the formula in Theorem 1.28, the polynomials integrals in [math]z_i,\bar{z}_i[/math] vanish, unless the number of [math]z_i,\bar{z}_i[/math] is the same. In this latter case these terms can be grouped together, by using [math]z_i\bar{z}_i=|z_i|^2[/math], and the relevant integration formula is:
Now with [math]N\to\infty[/math], we obtain from this the following estimate:
Thus, the rescaled variables [math]\sqrt{N}z_i[/math] become normal with [math]N\to\infty[/math], as claimed.
(2) As for the proof of the asymptotic independence, this is standard too, again by using Theorem 1.28. Indeed, the joint moments of [math]z_1,\ldots,z_N[/math] are given by:
By rescaling, the joint moments of the variables [math]y_i=\sqrt{N}z_i[/math] are given by:
Thus, we have multiplicativity, and so independence with [math]N\to\infty[/math], as claimed.
(3) Regarding the last assertion, we can use the basic fact that the rotations [math]U\in U_N[/math] act on the points of the sphere [math]z\in S^{N-1}_\mathbb C[/math], with the stabilizer of [math]z=(1,0,\ldots,0)[/math] being the subgroup [math]U_{N-1}\subset U_N[/math]. In algebraic terms, this gives an identification as follows:
In functional analytic terms, this result provides us with an embedding as follows, for any [math]i[/math], which makes correspond the respective integration functionals:
With this identification made, the result follows from (1,2).
As already mentioned in the real context, it is possible to get beyond such results, by using advanced group theory. We will be back to this, in chapter 4 below. It is also possible to formulate “free versions” of all the above, and we will do this later.
So long for the basics of probability theory, quickly explained. For further theory, the best is to go to a dedicated probability book, such as the one of Feller [1], or Durrett [2]. Alternatively, you can learn good probability theory from the preliminary chapters of more specialized probability books, and with the comment here that, among probabilists, the random matrix people know well their job, and are very close to what we will be doing in this book. Well-known introductions to random matrices include the classical and delightful book by Mehta [7], the more modern book by Anderson, Guionnet and Zeitouni [8], the books by Bose [9], Mingo and Speicher [10], and many more.
Needless to say, you can also learn reliable probability theory from physicists, or other scientists. In fact, probability theory was fully accepted only recently, in the late 20th century, as a respectable branch of mathematics, and if there are some scientists who have taken probability seriously, and this since ever, these are the physicists.
General references
Banica, Teo (2024). "Calculus and applications". arXiv:2401.00911 [math.CO].
References
- 1.0 1.1 W. Feller, An introduction to probability theory and its applications, Wiley (1950).
- 2.0 2.1 R. Durrett, Probability: theory and examples, Cambridge Univ. Press (1990).
- B. Collins and P. \'Sniady, Integration with respect to the Haar measure on unitary, orthogonal and symplectic groups, Comm. Math. Phys. 264 (2006), 773--795.
- D. Weingarten, Asymptotic behavior of group integrals in the limit of infinite rank, J. Math. Phys. 19 (1978), 999--1001.
- T. Banica, Introduction to quantum groups, Springer (2023).
- T. Banica and B. Collins, Integration over compact quantum groups, Publ. Res. Inst. Math. Sci. 43 (2007), 277--302.
- M.L. Mehta, Random matrices, Elsevier (1967).
- G.W. Anderson, A. Guionnet and O. Zeitouni, An introduction to random matrices, Cambridge Univ. Press (2010).
- A. Bose, Random matrices and non-commutative probability, CRC Press (2021).
- J.A. Mingo and R. Speicher, Free probability and random matrices, Springer (2017).