Spectral measures

[math] \newcommand{\mathds}{\mathbb}[/math]

This article was automatically generated from a tex file and may contain conversion errors. If permitted, you may login and edit this article to improve the conversion.

5a. Linear algebra

We have seen so far some interesting probability theory, dealing with usual random variables, which are by definition functions as follows, real or complex:

[[math]] f\in L^\infty(X) [[/math]]


We discuss in what follows more advanced aspects of probability theory, which are of rather “noncommutative” nature, in relation with the random matrices:

Definition

A random matrix is a square matrix of type

[[math]] Z\in M_N(L^\infty(X)) [[/math]]
with [math]X[/math] being a probability space, and [math]N\in\mathbb N[/math] being an integer.

As basic examples, we have the usual matrices [math]Z\in M_N(\mathbb C)[/math], obtained by taking [math]X=\{.\}[/math]. Also, we have the usual random variables [math]Z\in L^\infty(X)[/math], obtained by taking [math]N=1[/math]. In general, what we have is a joint generalization of these two situations.


As a first task, we must understand what the distribution of a random matrix is. This is something non-trivial, which will take some time. Let us begin with a discussion concerning the usual matrices [math]A\in M_N(\mathbb C)[/math]. We have here the following definition:

Definition

The moments of a complex matrix [math]A\in M_N(\mathbb C)[/math] are the following numbers, with [math]tr=N^{-1}\cdot Tr[/math] being the normalized matrix trace:

[[math]] M_k=tr(A^k) [[/math]]
The distribution, or law, of our matrix [math]A[/math] is the following abstract functional:

[[math]] \mu_A:\mathbb C[X]\to\mathbb C\quad,\quad P\to tr(P(A)) [[/math]]
In the case where we have a probability measure [math]\mu_A\in\mathcal P(\mathbb C)[/math] such that

[[math]] tr(P(A))=\int_\mathbb CP(x)\,d\mu_A(x) [[/math]]
we identify this complex measure with the distribution of [math]A[/math].

As a basic example for this, consider the case of a diagonal matrix:

[[math]] A=\begin{pmatrix} \lambda_1\\ &\ddots\\ &&\lambda_N\end{pmatrix} [[/math]]


The powers of [math]A[/math], with respect to integer exponents [math]k\in\mathbb N[/math], are as follows:

[[math]] A^k=\begin{pmatrix} \lambda_1^k\\ &\ddots\\ &&\lambda_N^k\end{pmatrix} [[/math]]


Thus the moments of [math]A[/math] are given by the following formula:

[[math]] M_k=\sum_i\lambda_i^k [[/math]]


More generally now, we have the following formula, valid for any [math]P\in\mathbb C[X][/math]:

[[math]] P(A)=\begin{pmatrix} P(\lambda_1)\\ &\ddots\\ &&P(\lambda_N)\end{pmatrix} [[/math]]


By applying the normalized trace, we obtain from this formula:

[[math]] \begin{eqnarray*} tr(P(A)) &=&\frac{1}{N}(P(\lambda_1)+\ldots+P(\lambda_N))\\ &=&\frac{1}{N}\int_\mathbb CP(x)d(\delta_{\lambda_1}+\ldots+\delta_{\lambda_N})(x)\\ &=&\int_\mathbb CP(x)d\left(\frac{1}{N}(\delta_{\lambda_1}+\ldots+\delta_{\lambda_N})\right)(x) \end{eqnarray*} [[/math]]


Thus, according to Definition 5.2, the law of [math]A[/math] is the following measure:

[[math]] \mu_A=\frac{1}{N}(\delta_{\lambda_1}+\ldots+\delta_{\lambda_N}) [[/math]]


Quite remarkably, the distribution always exists as a probability measure on [math]\mathbb C[/math], and is given by the above formula, as the average of the eigenvalues:

Theorem

For any matrix [math]A\in M_N(\mathbb C)[/math] we have the formula

[[math]] tr(P(A))=\frac{1}{N}(P(\lambda_1)+\ldots+P(\lambda_N)) [[/math]]
where [math]\lambda_1,\ldots,\lambda_N\in\mathbb C[/math] are the eigenvalues of [math]A[/math]. Thus the complex measure

[[math]] \mu_A=\frac{1}{N}(\delta_{\lambda_1}+\ldots+\delta_{\lambda_N}) [[/math]]
is the distribution of [math]A[/math], in the abstract sense of Definition 5.2.


Show Proof

According to the above discussion, the result holds for the diagonal matrices. More generally now, let us discuss the case where our matrix [math]A[/math] is diagonalizable. Here we must have a formula as follows, with [math]D[/math] being diagonal:

[[math]] A=PDP^{-1} [[/math]]


Now observe that the moments of [math]A[/math] are given by the following formula:

[[math]] \begin{eqnarray*} tr(A^k) &=&tr(PDP^{-1}\cdot PDP^{-1}\ldots PDP^{-1})\\ &=&tr(PD^kP^{-1})\\ &=&tr(D^k) \end{eqnarray*} [[/math]]


We conclude, by linearity, that the matrices [math]A,D[/math] have the same distribution:

[[math]] \mu_A=\mu_D [[/math]]


On the other hand, [math]A=PDP^{-1}[/math] shows that [math]A,D[/math] have the same eigenvalues. Thus, if we denote by [math]\lambda_1,\ldots,\lambda_N\in\mathbb C[/math] these eigenvalues, we obtain:

[[math]] \mu_A=\frac{1}{N}(\delta_{\lambda_1}+\ldots+\delta_{\lambda_N}) [[/math]]


Finally, in the general case, the result follows from what we know from the above, by using the well-known fact that the diagonalizable matrices are dense.

Summarizing, we have a nice theory for the matrices [math]A\in M_N(\mathbb C)[/math], paralleling that of the random variables [math]f\in L^\infty(X)[/math]. It is tempting at this point to try to go further, and unify the matrices and the random variables, by talking about random matrices:

[[math]] Z\in M_N(L^\infty(X)) [[/math]]


However, we will not do this right away, because our matrix theory has a flaw. Indeed, all what has being said above does not take into account the adjoint matrix:

[[math]] A^*=(\bar{A}_{ji}) [[/math]]


To be more precise, the idea is that the matrices [math]A\in M_N(\mathbb C)[/math] do not come alone, but rather in pairs [math](A,A^*)[/math], and this because no matter what you want to do with [math]A[/math], of advanced type, you will run at some point into its adjoint [math]A^*[/math]. Thus, we must talk about the moments and distribution of the pair [math](A,A^*)[/math]. This can be done as follows:

Definition

The generalized moments of a complex matrix [math]A\in M_N(\mathbb C)[/math] are the following numbers, indexed by the colored integers [math]k=\circ\bullet\bullet\circ\ldots[/math]

[[math]] M_k=tr(A^k) [[/math]]
with [math]A^k[/math] being defined by the following formulae and multiplicativity, [math]A^{kl}=A^kA^l[/math],

[[math]] A^\emptyset=1\quad,\quad A^\circ=A\quad,\quad A^\bullet=A^* [[/math]]
and with [math]tr=N^{-1}\cdot Tr[/math] being as usual the normalized matrix trace.

All this might seem a bit complicated, but this is the situation, and there is no other way of dealing with such things. Indeed, since the variables [math]A,A^*[/math] do not commute, unless the matrix is normal, [math]AA^*=A^*A[/math], which is something special, which does not happen in general, we are led to colored exponents [math]k=\circ\bullet\bullet\circ\ldots[/math] and to the above definition for the moments. Regarding now the distribution, we can use here a similar idea, as follows:

Definition

The generalized distribution, or law, of a matrix [math]A\in M_N(\mathbb C)[/math] is the abstract functional [math]\mu_A:\mathbb C \lt X,X^* \gt \to\mathbb C[/math] given by:

[[math]] P\to tr(P(A)) [[/math]]
In the case where we have a probability measure [math]\mu_A\in\mathcal P(\mathbb C)[/math] such that

[[math]] tr(P(A))=\int_\mathbb CP(x)\,d\mu_A(x) [[/math]]
we identify this complex measure with the distribution of [math]A[/math].

Observe thar knowing the distribution is the same as knowing the moments, because if we write our noncommutative polynomial as [math]P=\sum_kc_kX^k[/math], then we have:

[[math]] tr(P(A)) =tr\left(\sum_kc_kA^k\right) =\sum_kc_kM_k [[/math]]


As a first result now, coming from Theorem 5.3, we have:

Theorem

Given a matrix [math]A\in M_N(\mathbb C)[/math] which is self-adjoint, [math]A=A^*[/math], we have the following formula, valid for any polynomial [math]P\in\mathbb C \lt X,X^* \gt [/math],

[[math]] tr(P(A))=\frac{1}{N}(P(\lambda_1)+\ldots+P(\lambda_N)) [[/math]]
where [math]\lambda_1,\ldots,\lambda_N\in\mathbb C[/math] are the eigenvalues of [math]A[/math]. Thus the complex measure

[[math]] \mu_A=\frac{1}{N}(\delta_{\lambda_1}+\ldots+\delta_{\lambda_N}) [[/math]]
is the distribution of [math]A[/math], in the abstract sense of Definition 5.4.


Show Proof

This follows indeed from Theorem 5.3, because due to our self-adjointness assumption [math]A=A^*[/math], the adjoint matrix plays no role in all this.

Quite remarkably, Theorem 5.6 extends to the normal case. This is something non-trivial, that we will explain now, after some linear algebra. Let us start with:

Proposition

Any matrix [math]A\in M_N(\mathbb C)[/math] which is self-adjoint, [math]A=A^*[/math], is diagonalizable, with the diagonalization being of the following type,

[[math]] A=UDU^* [[/math]]
with [math]U\in U_N[/math], and with [math]D\in M_N(\mathbb R)[/math] diagonal. The converse holds too.


Show Proof

Let us first prove that the eigenvalues are real. If [math]Ax=\lambda x[/math], we have:

[[math]] \begin{eqnarray*} \lambda \lt x,x \gt &=& \lt Ax,x \gt \\ &=& \lt x,Ax \gt \\ &=&\bar{\lambda} \lt x,x \gt \end{eqnarray*} [[/math]]


Thus we obtain [math]\lambda\in\mathbb R[/math], as claimed. Our next claim now is that the eigenspaces corresponding to different eigenvalues are pairwise orthogonal. Assume indeed that:

[[math]] Ax=\lambda x\quad,\quad Ay=\mu y [[/math]]


We have then the following computation, by using [math]\lambda,\mu\in\mathbb R[/math]:

[[math]] \begin{eqnarray*} \lambda \lt x,y \gt &=& \lt Ax,y \gt \\ &=& \lt x,Ay \gt \\ &=&\mu \lt x,y \gt \end{eqnarray*} [[/math]]


Thus [math]\lambda\neq\mu[/math] implies [math]x\perp y[/math], as claimed. In order now to finish, it remains to prove that the eigenspaces span the whole [math]\mathbb C^N[/math]. For this purpose, we will use a recurrence method. Let us pick an eigenvector of our matrix, [math]Ax=\lambda x[/math]. Assuming [math]x\perp y[/math], we have:

[[math]] \begin{eqnarray*} \lt Ay,x \gt &=& \lt y,Ax \gt \\ &=& \lt y,\lambda x \gt \\ &=&\lambda \lt y,x \gt \\ &=&0 \end{eqnarray*} [[/math]]


Thus, if [math]x[/math] is an eigenvector of [math]A[/math], then the vector space [math]x^\perp[/math] is invariant under [math]A[/math]. On the other hand, since a square matrix [math]A[/math] is self-adjoint precisely when [math] \lt Ax,x \gt \in\mathbb R[/math], we conclude that the restriction of our matrix [math]A[/math] to the vector space [math]x^\perp[/math] is self-adjoint. Thus, we can proceed by recurrence, and we obtain in this way the result.

Let us discuss as well the case of the unitary matrices. We have here:

Proposition

Any matrix [math]U\in M_N(\mathbb C)[/math] which is unitary, [math]U^*=U^{-1}[/math], is diagonalizable, with the eigenvalues being on [math]\mathbb T[/math]. More precisely we have

[[math]] U=VDV^* [[/math]]
with [math]V\in U_N[/math], and with [math]D\in M_N(\mathbb T)[/math] diagonal. The converse holds too.


Show Proof

Assuming [math]Ux=\lambda x[/math], we have the following formula:

[[math]] \begin{eqnarray*} \lt x,x \gt &=& \lt U^*Ux,x \gt \\ &=& \lt Ux,Ux \gt \\ &=& \lt \lambda x,\lambda x \gt \\ &=&|\lambda|^2 \lt x,x \gt \end{eqnarray*} [[/math]]


Thus we obtain [math]\lambda\in\mathbb T[/math], as desired. Our next claim now is that the eigenspaces corresponding to different eigenvalues are pairwise orthogonal. Assume indeed that:

[[math]] Ux=\lambda x\quad,\quad Uy=\mu y [[/math]]


We have then the following computation, by using [math]U^*=U^{-1}[/math] and [math]\lambda,\mu\in\mathbb T[/math]:

[[math]] \begin{eqnarray*} \lambda \lt x,y \gt &=& \lt Ux,y \gt \\ &=& \lt x,U^*y \gt \\ &=& \lt x,U^{-1}y \gt \\ &=& \lt x,\mu^{-1}y \gt \\ &=&\mu \lt x,y \gt \end{eqnarray*} [[/math]]


Thus [math]\lambda\neq\mu[/math] implies [math]x\perp y[/math], as claimed. In order now to finish, it remains to prove that the eigenspaces span the whole [math]\mathbb C^N[/math]. For this purpose, we will use a recurrence method. Let us pick an eigenvector, [math]Ux=\lambda x[/math]. Assuming [math]x\perp y[/math], we have:

[[math]] \begin{eqnarray*} \lt Uy,x \gt &=& \lt y,U^*x \gt \\ &=& \lt y,U^{-1}x \gt \\ &=& \lt y,\lambda^{-1}x \gt \\ &=&\lambda \lt y,x \gt \\ &=&0 \end{eqnarray*} [[/math]]


Thus, if [math]x[/math] is an eigenvector of [math]U[/math], then the vector space [math]x^\perp[/math] is invariant under [math]U[/math]. Now since [math]U[/math] is an isometry, so is its restriction to this space [math]x^\perp[/math]. Thus this restriction is a unitary, and so we can proceed by recurrence, and we obtain the result.

We have in fact the following general result, extending what we know so far:

Theorem

Any matrix [math]A\in M_N(\mathbb C)[/math] which is normal, [math]AA^*=A^*A[/math], is diagonalizable, with the diagonalization being of the following type,

[[math]] A=UDU^* [[/math]]
with [math]U\in U_N[/math], and with [math]D\in M_N(\mathbb C)[/math] diagonal. The converse holds too.


Show Proof

This is something quite technical. Our first claim is that a matrix [math]A[/math] is normal precisely when the following is satisfied, for any vector [math]x[/math]:

[[math]] ||Ax||=||A^*x|| [[/math]]


Indeed, this equality can be written in the following way, which gives [math]AA^*=A^*A[/math]:

[[math]] \lt AA^*x,x \gt = \lt A^*Ax,x \gt [[/math]]


Our claim now is that [math]A,A^*[/math] have the same eigenvectors, with conjugate eigenvalues:

[[math]] Ax=\lambda x\implies A^*x=\bar{\lambda}x [[/math]]


Indeed, this follows from the following computation, and from the trivial fact that if [math]A[/math] is normal, then so is any matrix of type [math]A-\lambda 1_N[/math], with [math]\lambda\in\mathbb C[/math]:

[[math]] \begin{eqnarray*} ||(A^*-\bar{\lambda}1_N)x|| &=&||(A-\lambda 1_N)^*x||\\ &=&||(A-\lambda 1_N)x||\\ &=&0 \end{eqnarray*} [[/math]]


Let us prove now, by using this fact, that the eigenspaces of [math]A[/math] are pairwise orthogonal. Assuming [math]Ax=\lambda x[/math] and [math]Ay=\mu y[/math] with [math]\lambda\neq\mu[/math], we have:

[[math]] \begin{eqnarray*} \lambda \lt x,y \gt &=& \lt Ax,y \gt \\ &=& \lt x,A^*y \gt \\ &=& \lt x,\bar{\mu}y \gt \\ &=&\mu \lt x,y \gt \end{eqnarray*} [[/math]]


Thus [math]\lambda\neq\mu[/math] implies [math]x\perp y[/math], as desired. In order to finish now the proof, it remains to prove that the eigenspaces of [math]A[/math] span the whole [math]\mathbb C^N[/math]. This is something quite tricky, and our plan here will be that of proving that the eigenspaces of [math]AA^*[/math] are eigenspaces of [math]A[/math]. In order to do so, let us pick two eigenvectors [math]x,y[/math] of the matrix [math]AA^*[/math], corresponding to different eigenvalues, [math]\lambda\neq\mu[/math]. The eigenvalue equations are then as follows:

[[math]] AA^*x=\lambda x\quad,\quad AA^*y=\mu y [[/math]]


We have the following computation, by using the normality condition [math]AA^*=A^*A[/math], and the fact that the eigenvalues of [math]AA^*[/math], and in particular [math]\mu[/math], are real:

[[math]] \begin{eqnarray*} \lambda \lt Ax,y \gt &=& \lt A\lambda x,y \gt \\ &=& \lt AAA^*x,y \gt \\ &=& \lt AA^*Ax,y \gt \\ &=& \lt Ax,AA^*y \gt \\ &=& \lt Ax,\mu y \gt \\ &=&\mu \lt Ax,y \gt \end{eqnarray*} [[/math]]


We conclude that we have [math] \lt Ax,y \gt =0[/math]. But this reformulates as follows:

[[math]] \lambda\neq\mu\implies A(E_\lambda)\perp E_\mu [[/math]]


Now since the eigenspaces of [math]AA^*[/math] are pairwise orthogonal, and span the whole [math]\mathbb C^N[/math], we deduce that these eigenspaces are invariant under [math]A[/math]:

[[math]] A(E_\lambda)\subset E_\lambda [[/math]]


But with this result in hand, we can now finish. Indeed, we can decompose the problem, and the matrix [math]A[/math] itself, following these eigenspaces of [math]AA^*[/math], which in practice amounts in saying that we can assume that we only have 1 eigenspace. By rescaling, this is the same as assuming that we have [math]AA^*=1[/math], and so we are now into the unitary case, that we know how to solve, as explained in Proposition 5.8.

Getting back now to the laws of matrices, Theorem 5.6 extends to the normal case, [math]AA^*=A^*A[/math]. This is something non-trivial, the result being as follows:

Theorem

Given a matrix [math]A\in M_N(\mathbb C)[/math] which is normal, [math]AA^*=A^*A[/math], we have the following formula, valid for any polynomial [math]P\in\mathbb C \lt X,X^* \gt [/math],

[[math]] tr(P(A))=\frac{1}{N}(P(\lambda_1)+\ldots+P(\lambda_N)) [[/math]]
where [math]\lambda_1,\ldots,\lambda_N\in\mathbb C[/math] are the eigenvalues of [math]A[/math]. Thus the complex measure

[[math]] \mu_A=\frac{1}{N}(\delta_{\lambda_1}+\ldots+\delta_{\lambda_N}) [[/math]]
is the distribution of [math]A[/math], in the abstract sense of Definition 5.5.


Show Proof

There are several proofs for this fact, one of them being as follows:


(1) Let us first consider the case where the matrix is diagonal:

[[math]] A=\begin{pmatrix} \lambda_1\\ &\ddots\\ &&\lambda_N\end{pmatrix} [[/math]]


The moments of [math]A[/math] are then given by the following formula:

[[math]] M_k=\frac{1}{N}(\lambda_1^k+\ldots+\lambda_N^k) [[/math]]


Regarding now the distribution, this by definition given by:

[[math]] \mu_A:\mathbb C \lt X,X^* \gt \to\mathbb C\quad,\quad P\to tr(P(A)) [[/math]]


Since the matrix is normal, [math]AA^*=A^*A[/math], knowing this distribution is the same as knowing its restriction to the usual polynomials in two variables:

[[math]] \mu_A:\mathbb C[X,X^*]\to\mathbb C\quad,\quad P\to tr(P(A)) [[/math]]


By using now the fact that [math]A[/math] is diagonal, we conclude that the distribution is:

[[math]] \mu_A:\mathbb C[X,X^*]\to\mathbb C\quad,\quad P\to\frac{1}{N}(P(\lambda_1)+\ldots+P(\lambda_N)) [[/math]]


But this functional corresponds to integrating [math]P[/math] with respect to the following complex measure, that we agree to still denote by [math]\mu_A[/math], and call distribution of [math]A[/math]:

[[math]] \mu_A=\frac{1}{N}(\delta_{\lambda_1}+\ldots+\delta_{\lambda_N}) [[/math]]


(2) In the general case now, where [math]A\in M_N(\mathbb C)[/math] is normal and arbitrary, we can use Theorem 5.9, which tells us that [math]A[/math] is diagonalizable, and in fact that [math]A,A^*[/math] are jointly diagonalizable. To be more precise, let us write, as in Theorem 5.9:

[[math]] A=UDU^* [[/math]]


Here [math]U\in U_N[/math], and [math]D\in M_N(\mathbb C)[/math] is diagonal. The adjoint matrix is then given by:

[[math]] A^*=UD^*U [[/math]]


As before in the diagonal matrix case, since our matrix is normal, [math]AA^*=A^*A[/math], knowing its distribution in the abstract sense of Definition 5.5 is the same as knowing the restriction of this abstract distribution to the usual polynomials in two variables:

[[math]] \mu_A:\mathbb C[X,X^*]\to\mathbb C\quad,\quad P\to tr(P(A)) [[/math]]


In order now to compute this functional, we can change the basis via the above unitary matrix [math]U\in U_N[/math], which in practice means that we can assume [math]U=1[/math]. Thus, by using now (1), if we denote by [math]\lambda_1,\ldots,\lambda_N[/math] the diagonal entries of [math]D[/math], which are the eigenvalues of [math]A[/math], the distribution that we are looking for is the following functional:

[[math]] \mu_A:\mathbb C[X,X^*]\to\mathbb C\quad,\quad P\to\frac{1}{N}(P(\lambda_1)+\ldots+P(\lambda_N)) [[/math]]


As before, this functional corresponds to integrating [math]P[/math] with respect to the following complex measure, that we agree to still denote by [math]\mu_A[/math], and call distribution of [math]A[/math]:

[[math]] \mu_A=\frac{1}{N}(\delta_{\lambda_1}+\ldots+\delta_{\lambda_N}) [[/math]]


Thus, we are led to the conclusion in the statement.

We can now go ahead and discuss, eventually, the case of the random matrices, where things become truly interesting. We can extend Definition 5.5, as follows:

Definition

The colored moments of a random matrix

[[math]] Z\in M_N(L^\infty(X)) [[/math]]
are the following numbers, indexed by the colored integers [math]k=\circ\bullet\bullet\circ\ldots[/math]

[[math]] M_k=\int_Xtr(Z^k) [[/math]]
with the powers [math]Z^k[/math] being defined by [math]Z^\circ=Z[/math], [math]Z^\bullet=Z^*[/math] and multiplicativity.

Observe that this notion extends indeed the notion from Definition 5.5 for the usual matrices [math]Z\in M_N(\mathbb C)[/math], which can be recovered with [math]X=\{.\}[/math]. Also, in the case [math]N=1[/math], where our matrix is just a random variable [math]Z\in L^\infty(X)[/math], we recover in this way the usual moments, or rather the joint moments of the random variables [math]Z,\bar{Z}[/math]. Regarding now the distribution, we can use here a similar extension, as follows:

Definition

The distribution of a random matrix [math]Z\in M_N(L^\infty(X))[/math] is the abstract functional [math]\mu_Z:\mathbb C \lt X,X^* \gt \to\mathbb C[/math] given by:

[[math]] P\to\int_Xtr(P(Z)) [[/math]]
In the case where we have a probability measure [math]\mu_Z\in\mathcal P(\mathbb C)[/math] such that

[[math]] tr(P(Z))=\int_\mathbb CP(x)\,d\mu_Z(x) [[/math]]
we identify this measure with the distribution, or law of [math]Z[/math].

As basic examples, for the usual matrices [math]Z\in M_N(\mathbb C)[/math], obtained by taking [math]X=\{.\}[/math], we obtain the previous notion of distribution of a matrix, from Definition 5.5. Also, for the usual random variables [math]Z\in L^\infty(X)[/math], obtained by taking [math]N=1[/math], we obtain in this way the previous notion of distribution of a random variable, from chapters 1-2.

5b. Bounded operators

In order to further clarify all the above, and to discuss as well what happens in the non-normal case, we will need an extension of the theory that we have, going beyond the random matrix setting, by using some basic functional analysis and spectral theory. In order to get started, let us formulate the following definition:

Definition

A Hilbert space is a complex vector space [math]H[/math] given with a scalar product [math] \lt x,y \gt [/math], satisfying the following conditions:

  • [math] \lt x,y \gt [/math] is linear in [math]x[/math], and antilinear in [math]y[/math].
  • [math]\overline{ \lt x,y \gt }= \lt y,x \gt [/math], for any [math]x,y[/math].
  • [math] \lt x,x \gt \gt 0[/math], for any [math]x\neq0[/math].
  • [math]H[/math] is complete with respect to the norm [math]||x||=\sqrt{ \lt x,x \gt }[/math].

Here the fact that [math]||.||[/math] is indeed a norm comes from the Cauchy-Schwarz inequality, which states that if the conditions (1,2,3) above are satisfied, then we have:

[[math]] | \lt x,y \gt |\leq||x||\cdot||y|| [[/math]]


Indeed, this inequality comes from the fact that the following degree 2 polynomial, with [math]t\in\mathbb R[/math] and [math]w\in\mathbb T[/math], being positive, its discriminant must be negative:

[[math]] f(t)=||x+twy||^2 [[/math]]


At the level of the examples, we first have the Hilbert space [math]H=\mathbb C^N[/math], with its usual scalar product, taken by definition linear at left, namely:

[[math]] \lt x,y \gt =\sum_ix_i\bar{y}_i [[/math]]


More generally, making the link with probability, we have the following result:

Proposition

Given a measured space [math]X[/math], the functions [math]f:X\to\mathbb C[/math], taken up to equality almost everywhere, which are square-summable,

[[math]] \int_X|f(x)|^2dx \lt \infty [[/math]]
form a Hilbert space [math]L^2(X)[/math], with the following scalar product:

[[math]] \lt f,g \gt =\int_Xf(x)\overline{g(x)}\,dx [[/math]]
In the case where [math]X=I[/math] is a set endowed with its counting measure, we obtain the space [math]l^2(I)[/math] of square-summable sequences [math]\{x_i\}_{i\in I}\subset\mathbb C[/math], with [math] \lt x,y \gt =\sum_ix_i\bar{y}_i[/math].


Show Proof

There are several things to be proved, as follows:


(1) Our first claim is that [math]L^2(X)[/math] is a vector space, and here we must prove that [math]f,g\in L^2(X)[/math] implies [math]f+g\in L^2(X)[/math]. But this leads us into proving [math]||f+g||\leq||f||+||g||[/math], where [math]||f||=\sqrt{ \lt f,f \gt }[/math]. Now since this inequality holds on each subspace [math]\mathbb C^N\subset L^2(X)[/math] coming from step functions, this inequality holds everywhere, as desired.


(2) Our second claim is that [math] \lt \,, \gt [/math] is well-defined on [math]L^2(X)[/math]. But this follows from the Cauchy-Schwarz inequality, [math]| \lt f,g \gt |\leq||f||\cdot||g||[/math], which can be established by truncating, a bit like we established the Minkowski inequality in (1) above.


(3) It is also clear that [math] \lt \,, \gt [/math] is a scalar product on [math]L^2(X)[/math], with the remark here that if we want to have [math] \lt f,f \gt \gt 0[/math] for [math]f\neq 0[/math], we must declare that [math]f=0[/math] when [math]f=0[/math] almost everywhere, and so that [math]f=g[/math] when [math]f=g[/math] almost everywhere, as stated.


(4) It remains to prove that [math]L^2(X)[/math] is complete with respect to [math]||f||=\sqrt{ \lt f,f \gt }[/math]. But this is clear, because if we pick a Cauchy sequence [math]\{f_n\}_{n\in\mathbb N}\subset L^2(X)[/math], then we can construct a pointwise, and hence [math]L^2[/math] limit, [math]f_n\to f[/math], almost everywhere.


(5) Finally, the last assertion is clear, because the integration with respect to the counting measure is by definition a sum, and so we have [math]L^2(I)=l^2(I)[/math].

Quite remarkably, any Hilbert space must be of the form [math]L^2(X)[/math], and even of the special form [math]l^2(I)[/math]. This follows indeed from the following key result:

Theorem

Let [math]H[/math] be a Hilbert space.

  • Any algebraic basis of this space [math]\{f_i\}_{i\in I}[/math] can be turned into an orthonormal basis [math]\{e_i\}_{i\in I}[/math], by using the Gram-Schmidt procedure.
  • Thus, [math]H[/math] has an orthonormal basis, and so we have [math]H\simeq l^2(I)[/math], with [math]I[/math] being the indexing set for this orthonormal basis.


Show Proof

All this is standard by Gram-Schmidt, the idea being as follows:


(1) First of all, in finite dimensions an orthonormal basis [math]\{e_i\}_{i\in I}[/math] is by definition a usual algebraic basis, satisfying [math] \lt e_i,e_j \gt =\delta_{ij}[/math]. But the existence of such a basis follows by applying the Gram-Schmidt procedure to any algebraic basis [math]\{f_i\}_{i\in I}[/math], as claimed.


(2) In infinite dimensions, we can say that [math]\{f_i\}_{i\in I}[/math] is a basis of [math]H[/math] when the functions [math]f_i[/math] are linearly independent, and when the finite linear combinations of these functions [math]f_i[/math] form a dense subspace of [math]H[/math]. For orthogonal bases [math]\{e_i\}_{i\in I}[/math] these definitions are equivalent, and in any case, our statement makes now sense.


(3) Regarding now the proof, in infinite dimensions, this follows again from Gram-Schmidt, exactly as in the finite dimensional case, but by using this time a tool from logic, called Zorn lemma, in order to correctly do the recurrence.

The above result is something quite subtle, and suggests formulating:

Definition

A Hilbert space [math]H[/math] is called separable when the following equivalent conditions are satisfied:

  • [math]H[/math] has a countable algebraic basis [math]\{f_i\}_{i\in\mathbb N}[/math].
  • [math]H[/math] has a countable orthonormal basis [math]\{e_i\}_{i\in\mathbb N}[/math].
  • We have [math]H\simeq l^2(\mathbb N)[/math], isomorphism of Hilbert spaces.

As a first observation, according to the above, there is up to isomorphism only one separable Hilbert space, namely:

[[math]] H=l^2(\mathbb N) [[/math]]


This is, however, quite tricky, and can be a bit misleading. Consider for instance the space [math]H=L^2[0,1][/math] of square-summable functions [math]f:[0,1]\to\mathbb C[/math], with:

[[math]] \lt f,g \gt =\int_0^1f(x)\overline{g(x)}dx [[/math]]


This space is of course separable, because we can use the basis [math]f_n=x^n[/math] with [math]n\in\mathbb N[/math], orthogonalized by Gram-Schmidt. However, the orthogonalization procedure is something non-trivial, so the isomorphism [math]H\simeq l^2(\mathbb N)[/math] that we obtain is non-trivial as well.


Let us get now into the study of linear operators. We have here:

Theorem

Given a Hilbert space [math]H[/math], the linear operators [math]T:H\to H[/math] which are bounded, in the sense that the quantity

[[math]] ||T||=\sup_{||x||\leq1}||Tx|| [[/math]]
is finite, form a complex algebra [math]B(H)[/math], having the following properties:

  • [math]B(H)[/math] is complete with respect to [math]||.||[/math], and so we have a Banach algebra.
  • [math]B(H)[/math] has an involution [math]T\to T^*[/math], given by [math] \lt Tx,y \gt = \lt x,T^*y \gt [/math].

In addition, the norm and the involution are related by the formula [math]||TT^*||=||T||^2[/math].


Show Proof

The fact that we have indeed an algebra follows from:

[[math]] ||S+T||\leq||S||+||T||\quad,\quad ||\lambda T||=|\lambda|\cdot||T||\quad,\quad ||ST||\leq||S||\cdot||T|| [[/math]]


(1) Assuming that [math]\{T_k\}\subset B(H)[/math] is a Cauchy sequence, the sequence [math]\{T_kx\}[/math] is Cauchy for any [math]x\in H[/math], so we can define the limit [math]T=\lim_{k\to\infty}T_k[/math] by setting:

[[math]] Tx=\lim_{k\to\infty}T_kx [[/math]]


It is routine then to check that this formula defines indeed an operator [math]T\in B(H)[/math], and that we have [math]T_k\to T[/math] in norm, and this gives the result.


(2) The existence of [math]T^*[/math] comes from the fact that [math]\psi(x)= \lt Tx,y \gt [/math] being a linear map [math]H\to\mathbb C[/math], we must have a formula as follows, for a certain vector [math]T^*y\in H[/math]:

[[math]] \psi(x)= \lt x,T^*y \gt [[/math]]


Moreover, since this vector [math]T^*y[/math] is unique, [math]T^*[/math] is unique too, and we have as well:

[[math]] (S+T)^*=S^*+T^*\quad,\quad (\lambda T)^*=\bar{\lambda}T^* [[/math]]

[[math]] (ST)^*=T^*S^*\quad,\quad (T^*)^*=T [[/math]]


Observe also that we have indeed [math]T^*\in B(H)[/math], due to the following equality:

[[math]] \begin{eqnarray*} ||T|| &=&\sup_{||x||=1}\sup_{||y||=1} \lt Tx,y \gt \\ &=&\sup_{||y||=1}\sup_{||x||=1} \lt x,T^*y \gt \\ &=&||T^*|| \end{eqnarray*} [[/math]]


(3) Regarding now the last assertion, observe first that we have:

[[math]] ||TT^*|| \leq||T||\cdot||T^*|| =||T||^2 [[/math]]


On the other hand, we have as well the following estimate:

[[math]] \begin{eqnarray*} ||T||^2 &=&\sup_{||x||=1}| \lt Tx,Tx \gt |\\ &=&\sup_{||x||=1}| \lt x,T^*Tx \gt |\\ &\leq&||T^*T|| \end{eqnarray*} [[/math]]


Now by replacing in this formula [math]T\to T^*[/math] we obtain [math]||T||^2\leq||TT^*||[/math]. Thus, we have proved both the needed inequalities, and we are done.

In the case where [math]H[/math] comes with a basis [math]\{e_i\}_{i\in I}[/math], we can talk about the infinite matrices [math]M\in M_I(\mathbb C)[/math], with the remark that the multiplication of such matrices is not always defined, in the case [math]|I|=\infty[/math]. In this context, we have the following result:

Proposition

Let [math]H[/math] be a Hilbert space, with orthonormal basis [math]\{e_i\}_{i\in I}[/math]. The bounded operators [math]T\in B(H)[/math] can be then identified with matrices [math]M\in M_I(\mathbb C)[/math] via

[[math]] Tx=Mx\quad,\quad M_{ij}= \lt Te_j,e_i \gt [[/math]]
and we obtain in this way an embedding as follows, which is multiplicative:

[[math]] B(H)\subset M_I(\mathbb C) [[/math]]
In the case [math]H=\mathbb C^N[/math] we obtain in this way the usual isomorphism [math]B(H)\simeq M_N(\mathbb C)[/math]. In the separable case we obtain in this way a proper embedding [math]B(H)\subset M_\infty(\mathbb C)[/math].


Show Proof

We have several assertions to be proved, the idea being as follows:


(1) Regarding the first assertion, given a bounded operator [math]T:H\to H[/math], let us associate to it a matrix [math]M\in M_I(\mathbb C)[/math] as in the statement, by the following formula:

[[math]] M_{ij}= \lt Te_j,e_i \gt [[/math]]


It is clear that this correspondence [math]T\to M[/math] is linear, and also that its kernel is [math]\{0\}[/math]. Thus, we have an embedding of linear spaces [math]B(H)\subset M_I(\mathbb C)[/math].


(2) Our claim now is that this embedding is multiplicative. But this is clear too, because if we denote by [math]T\to M_T[/math] our correspondence, we have:

[[math]] \begin{eqnarray*} (M_{ST})_{ij} &=&\sum_k \lt Se_k,e_i \gt \lt Te_j,e_k \gt \\ &=&\sum_k(M_S)_{ik}(M_T)_{kj}\\ &=&(M_SM_T)_{ij} \end{eqnarray*} [[/math]]


(3) Finally, we must prove that the original operator [math]T:H\to H[/math] can be recovered from its matrix [math]M\in M_I(\mathbb C)[/math] via the formula in the statement, namely [math]Tx=Mx[/math]. But this latter formula holds for the vectors of the basis, [math]x=e_j[/math], because we have:

[[math]] (Te_j)_i = \lt Te_j,e_i \gt =M_{ij} =(Me_j)_i [[/math]]


Now by linearity we obtain from this that the formula [math]Tx=Mx[/math] holds everywhere, on any vector [math]x\in H[/math], and this finishes the proof of the first assertion.


(4) In finite dimensions we obtain an isomorphism, because any matrix [math]M\in M_N(\mathbb C)[/math] determines an operator [math]T:\mathbb C^N\to\mathbb C^N[/math], according to the formula [math] \lt Te_j,e_i \gt =M_{ij}[/math]. In infinite dimensions, however, we do not have an isomorphism. For instance on [math]H=l^2(\mathbb N)[/math] the following matrix does not define an operator:

[[math]] M=\begin{pmatrix}1&1&\ldots\\ 1&1&\ldots\\ \vdots&\vdots \end{pmatrix} [[/math]]


Indeed, [math]T(e_1)[/math] should be the all-one vector, which is not square-summable.

5c. Operator algebras

We will be interested here in the algebras of operators, rather than in the operators themselves. The axioms here, coming from Theorem 5.17, are as follows:

Definition

A [math]C^*[/math]-algebra is a complex algebra with unit [math]A[/math], having:

  • A norm [math]a\to||a||[/math], making it a Banach algebra (the Cauchy sequences converge).
  • An involution [math]a\to a^*[/math], which satisfies [math]||aa^*||=||a||^2[/math], for any [math]a\in A[/math].

As basic examples here, we have the usual matrix algebras [math]M_N(\mathbb C)[/math], with the norm and involution being the usual matrix norm and involution, given by:

[[math]] ||A||=\sup_{||x||=1}||Ax||\quad,\quad (A^*)_{ij}=\overline{A}_{ji} [[/math]]


Some other basic examples are the algebras [math]L^\infty(X)[/math] of essentially bounded functions [math]f:X\to\mathbb C[/math] on a measured space [math]X[/math], with the usual norm and involution, namely:

[[math]] ||f||=\sup_{x\in X}|f(x)|\quad,\quad f^*(x)=\overline{f(x)} [[/math]]


We can put these two basic classes of examples together, as follows:

Proposition

The random matrix algebras [math]A=M_N(L^\infty(X))[/math] are [math]C^*[/math]-algebras, with their usual norm and involution, given by:

[[math]] ||Z||=\sup_{x\in X}||Z_x||\quad,\quad (Z^*)_{ij}=\overline{Z}_{ij} [[/math]]
These algebras generalize both the algebras [math]M_N(\mathbb C)[/math], and the algebras [math]L^\infty(X)[/math].


Show Proof

The fact that the [math]C^*[/math]-algebra axioms are satisfied is clear from definitions. As for the last assertion, this follows by taking [math]X=\{.\}[/math] and [math]N=1[/math], respectively.

In order to study the [math]C^*[/math]-algebras, the key observation is that, due to Theorem 5.17, the algebra [math]B(H)[/math] of bounded linear operators [math]T:H\to H[/math] on a Hilbert space [math]H[/math] is a [math]C^*[/math]-algebra. More generally, any closed [math]*[/math]-subalgebra [math]A\subset B(H)[/math] is a [math]C^*[/math]-algebra. It is possible to prove that any [math]C^*[/math]-algebra appears in this way, [math]A\subset B(H)[/math], and we will be back to this later. For the moment, let us just record the following elementary result, dealing with the random matrix case, that we are mainly interested in here:

Theorem

Any algebra of type [math]L^\infty(X)[/math] is an operator algebra, as follows:

[[math]] L^\infty(X)\subset B(L^2(X))\quad,\quad f\to(g\to fg) [[/math]]
More generally, any random matrix algebra is an operator algebra, as follows,

[[math]] M_N(L^\infty(X))\subset B\left(\mathbb C^N\otimes L^2(X)\right) [[/math]]
with the embedding being the above one, tensored with the identity.


Show Proof

We have two assertions to be proved, the idea being as follows:


(1) Given [math]f\in L^\infty(X)[/math], consider the following operator, acting on [math]H=L^2(X)[/math]:

[[math]] T_f(g)=fg [[/math]]


Observe that [math]T_f[/math] is indeed well-defined, and bounded as well, because:

[[math]] ||fg||_2 =\sqrt{\int_X|f(x)|^2|g(x)|^2d\mu(x)} \leq||f||_\infty||g||_2 [[/math]]


The application [math]f\to T_f[/math] being linear, involutive, continuous, and injective as well, we obtain in this way a [math]C^*[/math]-algebra embedding [math]L^\infty(X)\subset B(H)[/math], as desired.


(2) Regarding the second assertion, this is best viewed in the following way:

[[math]] \begin{eqnarray*} M_N(L^\infty(X)) &=&M_N(\mathbb C)\otimes L^\infty(X)\\ &\subset&M_N(\mathbb C)\otimes B(L^2(X))\\ &=&B\left(\mathbb C^N\otimes L^2(X)\right) \end{eqnarray*} [[/math]]


Here we have used (1), and some standard tensor product identifications.

Our purpose in what follows is to develop the spectral theory of the [math]C^*[/math]-algebras, and in particular that of the random matrix algebras [math]A=M_N(L^\infty(X))[/math] that we are interested in, one of our objectives being that of talking about spectral measures, in the normal case, in analogy with what we know about the usual matrices. Let us start with:

Definition

The spectrum of an element [math]a\in A[/math] is the set

[[math]] \sigma(a)=\left\{\lambda\in\mathbb C\Big|a-\lambda\not\in A^{-1}\right\} [[/math]]
where [math]A^{-1}\subset A[/math] is the set of invertible elements.

Given an element [math]a\in A[/math], and a rational function [math]f=P/Q[/math] having poles outside [math]\sigma(a)[/math], we can construct the element [math]f(a)=P(a)Q(a)^{-1}[/math]. For simplicity, we write:

[[math]] f(a)=\frac{P(a)}{Q(a)} [[/math]]


With this convention, we have the following result:

Proposition

We have the “rational functional calculus” formula

[[math]] \sigma(f(a))=f(\sigma(a)) [[/math]]
valid for any rational function [math]f\in\mathbb C(X)[/math] having poles outside [math]\sigma(a)[/math].


Show Proof

We can prove this result in two steps, as follows:


(1) Assume first that we are in the usual polynomial case, [math]f\in\mathbb C[X][/math]. We pick a number [math]\lambda\in\mathbb C[/math], and we decompose the polynomial [math]f-\lambda[/math]:

[[math]] f(X)-\lambda=c(X-p_1)\ldots(X-p_n) [[/math]]


We have then, as desired, the following computation:

[[math]] \begin{eqnarray*} \lambda\notin\sigma(f(a)) &\iff&f(a)-\lambda\in A^{-1}\\ &\iff&c(a-p_1)\ldots(a-p_n)\in A^{-1}\\ &\iff&a-p_1,\ldots,a-p_n\in A^{-1}\\ &\iff&p_1,\ldots,p_n\notin\sigma(a)\\ &\iff&\lambda\notin f(\sigma(a)) \end{eqnarray*} [[/math]]


(2) In the general case now, [math]f\in\mathbb C(X)[/math], we pick [math]\lambda\in\mathbb C[/math], we write [math]f=P/Q[/math], and we set [math]R=P-\lambda Q[/math]. By using (1) above, we obtain:

[[math]] \begin{eqnarray*} \lambda\in\sigma(f(a)) &\iff&R(a)\notin A^{-1}\\ &\iff&0\in\sigma(R(a))\\ &\iff&0\in R(\sigma(a))\\ &\iff&\exists\mu\in\sigma(a),R(\mu)=0\\ &\iff&\lambda\in f(\sigma(a)) \end{eqnarray*} [[/math]]


Thus, we have obtained the formula in the statement.

Given an element [math]a\in A[/math], its spectral radius [math]\rho (a)[/math] is the radius of the smallest disk centered at [math]0[/math] containing [math]\sigma(a)[/math]. With this convention, we have the following key result:

Theorem

Let [math]A[/math] be a [math]C^*[/math]-algebra.

  • The spectrum of a norm one element is in the unit disk.
  • The spectrum of a unitary element [math](a^*=a^{-1}[/math]) is on the unit circle.
  • The spectrum of a self-adjoint element ([math]a=a^*[/math]) consists of real numbers.
  • The spectral radius of a normal element ([math]aa^*=a^*a[/math]) is equal to its norm.


Show Proof

We use the various results established above, as follows:


(1) This comes from the following basic formula, valid when [math]||a|| \lt 1[/math]:

[[math]] \frac{1}{1-a}=1+a+a^2+\ldots [[/math]]



(2) Assuming [math]a^*=a^{-1}[/math], we have the following computations:

[[math]] ||a||=\sqrt{||aa^*||}=\sqrt{1}=1 [[/math]]

[[math]] ||a^{-1}||=||a^*||=||a||=1 [[/math]]


If we denote by [math]D[/math] the unit disk, we obtain from this, by using (1):

[[math]] \sigma(a)\subset D\quad,\quad \sigma(a^{-1})\subset D [[/math]]


On the other hand, by using the function [math]f(z)=z^{-1}[/math], we have:

[[math]] \sigma(a^{-1})\subset D\implies \sigma(a)\subset D^{-1} [[/math]]


Thus we have [math]\sigma(a)\subset D\cap D^{-1}=\mathbb T[/math], as desired.


(3) This follows by using the result (2), just established above, and Proposition 5.23, with the following rational function, depending on a parameter [math]t\in\mathbb R[/math]:

[[math]] f(z)=\frac{z+it}{z-it} [[/math]]


Indeed, for [math]t \gt \gt 0[/math] the element [math]f(a)[/math] is well-defined, and we have:

[[math]] \left(\frac{a+it}{a-it}\right)^* =\frac{a-it}{a+it} =\left(\frac{a+it}{a-it}\right)^{-1} [[/math]]


Thus the element [math]f(a)[/math] is a unitary, and by using (2) above its spectrum is contained in [math]\mathbb T[/math]. We conclude that we have an inclusion as follows:

[[math]] f(\sigma(a))=\sigma(f(a))\subset\mathbb T [[/math]]


Thus, we obtain an inclusion [math]\sigma(a)\subset f^{-1}(\mathbb T)=\mathbb R[/math], and we are done.


(4) We already know from (1) that we have the following inequality:

[[math]] \rho(a)\leq||a|| [[/math]]


For the converse, we fix an arbitrary number [math]\rho \gt \rho(a)[/math]. We have then:

[[math]] \int_{|z|=\rho}\frac{z^n}{z -a}\,dz =\sum_{k=0}^\infty\left(\int_{|z|=\rho}z^{n-k-1}dz\right)a^k =a^{n-1} [[/math]]


By applying the norm and taking [math]n[/math]-th roots we obtain from this:

[[math]] \rho\geq\lim_{n\to\infty}||a^n||^{1/n} [[/math]]


In the case [math]a=a^*[/math] we have [math]||{a^n}||=||{a}||^n[/math] for any exponent of the form [math]n=2^k[/math], and by taking [math]n[/math]-th roots we get [math]\rho\geq||a||[/math]. But this gives the missing inequality, namely:

[[math]] \rho(a)\geq ||a|| [[/math]]


In the general case [math]aa^*=a^*a[/math] we have [math]a^n(a^n)^*=(aa^*)^n[/math]. Thus [math]\rho(a)^2=\rho(aa^*)[/math], and since the element [math]aa^*[/math] is self-adjoint, we obtain [math]\rho(aa^*)=||a||^2[/math], and we are done.

We are now in position of proving a key result, due to Gelfand, as follows:

Theorem

Any commutative [math]C^*[/math]-algebra is the form

[[math]] A=C(X) [[/math]]
with its “spectrum” [math]X=Spec(A)[/math] appearing as the space of characters [math]\chi :A\to\mathbb C[/math].


Show Proof

Given a commutative [math]C^*[/math]-algebra [math]A[/math], we can define [math]X[/math] to be the set of characters [math]\chi :A\to\mathbb C[/math], with topology making continuous all evaluation maps [math]ev_a:\chi\to\chi(a)[/math]. Then [math]X[/math] is a compact space, and [math]a\to ev_a[/math] is a morphism of algebras, as follows:

[[math]] ev:A\to C(X) [[/math]]


(1) We first prove that [math]ev[/math] is involutive. For this purpose we use the following formula, which is similar to the [math]z=Re(z)+iIm(z)[/math] formula for usual complex numbers:

[[math]] a=\frac{a+a^*}{2}+i\cdot\frac{a-a^*}{2i} [[/math]]


Thus it is enough to prove the equality [math]ev_{a^*}=ev_a^*[/math] for self-adjoint elements [math]a[/math]. But this is the same as proving that [math]a=a^*[/math] implies that [math]ev_a[/math] is a real function, which is in turn true, because [math]ev_a(\chi)=\chi(a)[/math] is an element of the spectrum [math]\sigma(a)[/math], contained in [math]\mathbb R[/math].


(2) Since [math]A[/math] is commutative, each element is normal, so [math]ev[/math] is isometric, due to:

[[math]] ||ev_a|| =\rho(a) =||a|| [[/math]]


(3) It remains to prove that [math]ev[/math] is surjective. But this follows from the Stone-Weierstrass theorem, because [math]ev(A)[/math] is a closed subalgebra of [math]C(X)[/math], which separates the points.

As a main consequence of the Gelfand theorem, we have:

Theorem

For any normal element [math]a\in A[/math] we have an identification as follows:

[[math]] \lt a \gt =C(\sigma(a)) [[/math]]
In addition, given a function [math]f\in C(\sigma(a))[/math], we can apply it to [math]a[/math], and we have

[[math]] \sigma(f(a))=f(\sigma(a)) [[/math]]
which generalizes the previous rational calculus formula, in the normal case.


Show Proof

Since [math]a[/math] is normal, the [math]C^*[/math]-algebra [math] \lt a \gt [/math] that is generates is commutative, so if we denote by [math]X[/math] the space of the characters [math]\chi: \lt a \gt \to\mathbb C[/math], we have:

[[math]] \lt a \gt =C(X) [[/math]]


Now since the map [math]X\to\sigma(a)[/math] given by evaluation at [math]a[/math] is bijective, we obtain:

[[math]] \lt a \gt =C(\sigma(a)) [[/math]]


Thus, we are dealing here with usual functions, and this gives all the assertions.

5d. Spectral measures

In order to get now towards noncommutative probability, we have to develop the theory of positive elements, and linear forms. First, we have the following result:

Proposition

For an element [math]a\in A[/math], the following are equivalent:

  • [math]a[/math] is positive, in the sense that [math]\sigma(a)\subset[0,\infty)[/math].
  • [math]a=b^2[/math], for some [math]b\in A[/math] satisfying [math]b=b^*[/math].
  • [math]a=cc^*[/math], for some [math]c\in A[/math].


Show Proof

This is something very standard, as follows:


[math](1)\implies(2)[/math] Observe first that [math]\sigma(a)\subset\mathbb R[/math] implies [math]a=a^*[/math]. Thus the algebra [math] \lt a \gt [/math] is commutative, and by using Theorem 5.26, we can set [math]b=\sqrt{a}[/math].


[math](2)\implies(3)[/math] This is trivial, because we can simply set [math]c=b[/math].


[math](2)\implies(1)[/math] This is clear too, because we have:

[[math]] \sigma(a) =\sigma(b^2) =\sigma(b)^2\subset\mathbb R^2 =[0,\infty) [[/math]]


[math](3)\implies(1)[/math] We proceed by contradiction. By multiplying [math]c[/math] by a suitable element of [math] \lt cc^* \gt [/math], we are led to the existence of an element [math]d\neq0[/math] satisfying:

[[math]] -dd^*\geq0 [[/math]]


By writing now [math]d=x+iy[/math] with [math]x=x^*,y=y^*[/math] we have:

[[math]] dd^*+d^*d =2(x^2+y^2) \geq0 [[/math]]


Thus [math]d^*d\geq0[/math], which is easily seen to contradict the condition [math]-dd^*\geq0[/math].

We can talk as well about positive linear forms, as follows:

Definition

Consider a linear map [math]\varphi:A\to\mathbb C[/math].

  • [math]\varphi[/math] is called positive when [math]a\geq0\implies\varphi(a)\geq0[/math].
  • [math]\varphi[/math] is called faithful and positive when [math]a\geq0,a\neq0\implies\varphi(a) \gt 0[/math].

In the commutative case, [math]A=C(X)[/math], the positive linear forms appear as follows, with [math]\mu[/math] being positive, and strictly positive if we want [math]\varphi[/math] to be faithful and positive:

[[math]] \varphi(f)=\int_Xf(x)d\mu(x) [[/math]]


In general, the positive linear forms can be thought of as being integration functionals with respect to some underlying “positive measures”. We have:

Definition

Let [math]A[/math] be a [math]C^*[/math]-algebra, given with a positive trace [math]tr:A\to\mathbb C[/math].

  • The elements [math]a\in A[/math] are called random variables.
  • The moments of such a variable are the numbers [math]M_k(a)=tr(a^k)[/math].
  • The law of such a variable is the functional [math]\mu_a:P\to tr(P(a))[/math].

Here the exponent [math]k=\circ\bullet\bullet\circ\ldots[/math] is by definition a colored integer, and the powers [math]a^k[/math] are defined by the following formulae, and multiplicativity:

[[math]] a^\emptyset=1\quad,\quad a^\circ=a\quad,\quad a^\bullet=a^* [[/math]]

As for the polynomial [math]P[/math], this is a noncommuting [math]*[/math]-polynomial in one variable:

[[math]] P\in\mathbb C \lt X,X^* \gt [[/math]]


Observe that the law is uniquely determined by the moments, because we have:

[[math]] P(X)=\sum_k\lambda_kX^k \implies\mu_a(P)=\sum_k\lambda_kM_k(a) [[/math]]


At the level of the general theory, we have the following key result, extending the various results that we have, regarding the self-adjoint and normal matrices:

Theorem

Let [math]A[/math] be a [math]C^*[/math]-algebra, with a trace [math]tr[/math], and consider an element [math]a\in A[/math] which is normal, in the sense that [math]aa^*=a^*a[/math].

  • [math]\mu_a[/math] is a complex probability measure, satisfying [math]supp(\mu_a)\subset\sigma(a)[/math].
  • In the self-adjoint case, [math]a=a^*[/math], this measure [math]\mu_a[/math] is real.
  • Assuming that [math]tr[/math] is faithful, we have [math]supp(\mu_a)=\sigma(a)[/math].


Show Proof

This is something very standard, that we already know for the usual complex matrices, and whose proof in general is quite similar, as follows:


(1) In the normal case, [math]aa^*=a^*a[/math], the Gelfand theorem, or rather the subsequent continuous functional calculus theorem, tells us that we have:

[[math]] \lt a \gt =C(\sigma(a)) [[/math]]


Thus the functional [math]f(a)\to tr(f(a))[/math] can be regarded as an integration functional on the algebra [math]C(\sigma(a))[/math], and by the Riesz theorem, this latter functional must come from a probability measure [math]\mu[/math] on the spectrum [math]\sigma(a)[/math], in the sense that we must have:

[[math]] tr(f(a))=\int_{\sigma(a)}f(z)d\mu(z) [[/math]]


We are therefore led to the conclusions in the statement, with the uniqueness assertion coming from the fact that the elements [math]a^k[/math], taken as usual with respect to colored integer exponents, [math]k=\circ\bullet\bullet\circ\ldots[/math]\,, generate the whole [math]C^*[/math]-algebra [math]C(\sigma(a))[/math].


(2) This is something which is clear from definitions.


(3) Once again, this is something which is clear from definitions.

As a first concrete application now, by getting back to the random matrices, and to the various questions raised in the beginning of this chapter, we have:

Theorem

Given a random matrix [math]Z\in M_N(L^\infty(X))[/math] which is normal,

[[math]] ZZ^*=Z^*Z [[/math]]
its law, which is by definition the following abstract functional,

[[math]] \mu:\mathbb C \lt X,X^* \gt \to\mathbb C\quad,\quad P\to\frac{1}{N}\int_Xtr(P(Z)) [[/math]]
when restricted to the usual polynomials in two variables,

[[math]] \mu:\mathbb C[X,X^*]\to\mathbb C\quad,\quad P\to\frac{1}{N}\int_Xtr(P(Z)) [[/math]]
must come from a probability measure on the spectrum [math]\sigma(Z)\subset\mathbb C[/math], as follows:

[[math]] \mu(P)=\int_{\sigma(T)}P(x)d\mu(x) [[/math]]
We agree to use the symbol [math]\mu[/math] for all these notions.


Show Proof

This follows indeed from what we know from Theorem 5.30, applied to the normal element [math]a=Z[/math], belonging to the [math]C^*[/math]-algebra [math]A=M_N(L^\infty(X))[/math].

General references

Banica, Teo (2024). "Calculus and applications". arXiv:2401.00911 [math.CO].