Diagonalization
4a. Linear spaces
In this chapter we discuss the diagonalization question, with a number of advanced results, for the complex matrices [math]A\in M_N(\mathbb C)[/math]. Our techniques will apply in particular to the real case, [math]A\in M_N(\mathbb R)[/math], and we will obtain a number of non-trivial results regarding the diagonalization of such matrices, over the complex numbers.
In order to obtain our results, the idea will be that of looking at the eigenvectors [math]v\in\mathbb C^N[/math] corresponding to a given eigenvalue [math]\lambda\in\mathbb C[/math], taken altogether:
Given a matrix [math]A\in M_N(\mathbb C)[/math], its eigenspaces are given by:
As a first observation, these eingenspaces are stable under taking sums, and also under multiplying by scalars, and this due to the following trivial facts:
Abstractly speaking, these two properties tell us that the eigenspaces [math]E_\lambda[/math] are linear subspaces of the vector space [math]\mathbb C^N[/math], in the following sense:
A subset [math]V\subset\mathbb C^N[/math] is called a linear space, or vector space, when:
Before getting into the study of the eigenspaces of a given matrix [math]A\in M_N(\mathbb C)[/math], let us develop some general theory for the arbitrary linear spaces [math]V\subset\mathbb C^N[/math].
As a first objective, we would like to talk about bases of such subspaces [math]V\subset\mathbb C^N[/math]. We already know, from chapter 1, how to do this for [math]V=\mathbb C^N[/math] itself. In order to review and improve the material there, let us start with the following key result:
For a matrix [math]A\in M_N(\mathbb C)[/math], the following are equivalent:
- [math]A[/math] is right invertible: [math]\exists B\in M_N(\mathbb C)[/math], [math]AB=1[/math].
- [math]A[/math] is left invertible: [math]\exists B\in M_N(\mathbb C)[/math], [math]BA=1[/math].
- [math]A[/math] is invertible: [math]\exists B\in M_N(\mathbb C)[/math], [math]AB=BA=1[/math].
- [math]A[/math] has nonzero determinant: [math]\det A\neq0[/math].
This follows indeed from the theory of the determinant, as developed in chapter 3 above, with all the conditions being equivalent to (4).
In terms of linear maps, we are led to the following statement:
For a linear map [math]f:\mathbb C^N\to\mathbb C^N[/math], the following are equivalent:
- [math]f[/math] is injective.
- [math]f[/math] is surjective.
- [math]f[/math] is bijective.
- [math]f(x)=Ax[/math], with [math]A[/math] invertible.
This follows indeed from Proposition 4.3, by using the composition formula [math]f_{AB}=f_Af_B[/math] for the linear maps [math]\mathbb C^N\to\mathbb C^N[/math], written in the form [math]f_A(x)=Ax[/math].
Getting now to the point where we wanted to get, we have:
For a system of vectors [math]v_1,\ldots,v_N\in\mathbb C^N[/math], the following are equivalent:
- The vectors [math]v_i[/math] are linearly independent, in the sense that:
[[math]] \sum\lambda_iv_i=0\implies\lambda_i=0 [[/math]]
- The vectors [math]v_i[/math] span [math]\mathbb C^N[/math], in the sense that any [math]x\in\mathbb C^N[/math] can be written as:
[[math]] x=\sum\lambda_iv_i [[/math]]
- The vectors [math]v_i[/math] form a basis of [math]\mathbb C^N[/math], in the sense that any vector [math]x\in\mathbb C^N[/math] can be written in a unique way as:
[[math]] x=\sum\lambda_iv_i [[/math]]
- The matrix formed by these vectors, regarded as usual as column vectors,
[[math]] P=[v_1,\ldots,v_N]\in M_N(\mathbb C) [[/math]]is invertible, with respect to the usual multiplication of the matrices.
Given a family of vectors [math]\{v_1,\ldots,v_N\}\subset\mathbb C^N[/math] as in the statement, consider the following linear map, associated to these vectors:
The conditions (1,2,3) in the statement tell us that this map must be respectively injective, surjective, bijective. As for the condition (4) in the statement, this is related as well to this map, because the matrix of this map is the matrix [math]P[/math] there:
With these observations in hand, the result follows from Proposition 4.4.
More generally now, we have the following result:
For a system of vectors [math]\{v_1,\ldots,v_M\}\subset V[/math] belonging to a linear subspace [math]V\subset\mathbb C^N[/math], the following conditions are equivalent:
- The vectors [math]v_i[/math] are linearly independent, in the sense that:
[[math]] \sum\lambda_iv_i=0\implies\lambda_i=0 [[/math]]
- The vectors [math]v_i[/math] span [math]V[/math], in the sense that any [math]x\in V[/math] can be written as:
[[math]] x=\sum\lambda_iv_i [[/math]]
- The vectors [math]v_i[/math] form a basis of [math]V[/math], in the sense that any vector [math]x\in V[/math] can be written in a unique way as:
[[math]] x=\sum\lambda_iv_i [[/math]]
If these conditions are satisfied, we say that [math]V[/math] has dimension [math]M[/math], and this dimension is independent on the choice of the basis.
This is something that we already know for [math]V=\mathbb C^N[/math] itself, coming from Theorem 4.5, with the dimension discussion at the end being something elementary. As for the extension to the general case, involving an arbitrary linear space [math]V\subset\mathbb C^N[/math], this is something which is routine, by using recurrence arguments, where needed.
4b. Diagonalization
Let us begin with a reminder of the basic diagonalization theory, that we already know. The basic theory that we have so far can be summarized as follows:
Assuming that a matrix [math]A\in M_N(\mathbb C)[/math] is diagonalizable, in the sense that [math]\mathbb C^N[/math] has a basis formed by eigenvectors of [math]A[/math], we have
This is something that we already know, coming by changing the basis. We can prove this by direct computation as well, because we have [math]Pe_i=v_i[/math], and so the matrices [math]A[/math] and [math]PDP^{-1}[/math] follow to act in the same way on the basis vectors [math]v_i[/math]:
Thus, the matrices [math]A[/math] and [math]PDP^{-1}[/math] coincide, as stated.
In general, in order to study the diagonalization problem, the idea is that the eigenvectors can be grouped into linear spaces, called eigenspaces, as follows:
Let [math]A\in M_N(\mathbb C)[/math], and for any eigenvalue [math]\lambda\in\mathbb C[/math] define the corresponding eigenspace as being the vector space formed by the corresponding eigenvectors:
We prove the first assertion by recurrence on [math]k\in\mathbb N[/math]. Assume by contradiction that we have a formula as follows, with the scalars [math]c_1,\ldots,c_k[/math] being not all zero:
By dividing by one of these scalars, we can assume that our formula is:
Now let us apply [math]A[/math] to this vector. On the left we obtain:
On the right we obtain something different, as follows:
We conclude from this that the following equality must hold:
On the other hand, we know by recurrence that the vectors [math]v_1,\ldots,v_{k-1}[/math] must be linearly independent. Thus, the coefficients must be equal, at right and at left:
Now since at least one [math]c_i[/math] must be nonzero, from the corresponding equality [math]\lambda_kc_i=c_i\lambda_i[/math] we obtain [math]\lambda_k=\lambda_i[/math], which is a contradiction. Thus our proof by recurrence of the first assertion is complete. As for the second assertion, this follows from the first one.
In order to reach now to more advanced results, we can use the characteristic polynomial. Here is a result summarizing and improving our knowledge of the subject:
Given a matrix [math]A\in M_N(\mathbb C)[/math], consider its characteristic polynomial:
The first assertion follows from the following computation, using the fact that a linear map is bijective when the determinant of the associated matrix is nonzero:
Regarding now the second assertion, given an eigenvalue [math]\lambda[/math] of our matrix [math]A[/math], consider the dimension of the corresponding eigenspace:
By changing the basis of [math]\mathbb C^N[/math], as for the eigenspace [math]E_\lambda[/math] to be spanned by the first [math]d_\lambda[/math] basis elements, our matrix becomes as follows, with [math]B[/math] being a certain smaller matrix:
We conclude that the characteristic polynomial of [math]A[/math] is of the following form:
Thus we have [math]m_\lambda\geq d_\lambda[/math], which leads to the conclusion in the statement.
We can put together Theorem 4.8 and Theorem 4.9, and by using as well the fact that any complex polynomial of degree [math]N[/math] has exactly [math]N[/math] complex roots, when counted with multiplicities, that we know from chapter 3 above, we obtain the folowing result:
Given a matrix [math]A\in M_N(\mathbb C)[/math], consider its characteristic polynomial
then factorize this polynomial, by computing the complex roots, with multiplicities,
This follows indeed from Theorem 4.8 and Theorem 4.9, and from the above-mentioned result regarding the complex roots of complex polynomials.
This was for the main result of linear algebra. There are countless applications of this, and generally speaking, advanced linear algebra consists in building on Theorem 4.10. Let us record as well a useful algorithmic version of the above result:
The square matrices [math]A\in M_N(\mathbb C)[/math] can be diagonalized as follows:
- Compute the characteristic polynomial.
- Factorize the characteristic polynomial.
- Compute the eigenvectors, for each eigenvalue found.
- If there are no [math]N[/math] eigenvectors, [math]A[/math] is not diagonalizable.
- Otherwise, [math]A[/math] is diagonalizable, [math]A=PDP^{-1}[/math].
This is an informal reformulation of Theorem 4.10, with (4) referring to the total number of linearly independent eigenvectors found in (3), and with [math]A=PDP^{-1}[/math] in (5) being the usual diagonalization formula, with [math]P,D[/math] being as before.
As a remark here, in step (3) it is always better to start with the eigenvalues having big multiplicity. Indeed, a multiplicity 1 eigenvalue, for instance, can never lead to the end of the computation, via (4), simply because the eigenvectors always exist.
4c. Matrix tricks
At the level of basic examples of diagonalizable matrices, we first have:
For a matrix [math]A\in M_N(\mathbb C)[/math] the following conditions are equivalent,
- The eigenvalues are different, [math]\lambda_i\neq\lambda_j[/math],
- The characteristic polynomial [math]P[/math] has simple roots,
- The characteristic polynomial satisfies [math](P,P')=1[/math],
and in this case, the matrix is diagonalizable.
The equivalences in the statement are clear, the idea being as follows:
[math](1)\iff(2)[/math] This follows from Theorem 4.10.
[math](2)\iff(3)[/math] This is standard, the double roots of [math]P[/math] being roots of [math]P'[/math].
Regarding now the last assertion, this follows from Theorem 4.10 as well. Indeed, the criterion there for diagonalization involving the sum of dimensions of eigenspaces is trivially satisfied when the eigenvalues are different, because:
Thus, we are led to the conclusion in the statement.
As an important comment, the assumptions of Theorem 4.12 can be effectively verified in practice, without the need for factorizing polynomials, the idea here being that of using the condition (3) there. In order to discuss this, let us start with:
Given two polynomials [math]P,Q\in\mathbb C[X][/math], written as follows,
Given polynomials [math]P,Q\in\mathbb C[X][/math], we can certainly construct the quantity [math]R(P,Q)[/math] in the statement, and we have then [math]R(P,Q)=0[/math] precisely when [math]P,Q[/math] have a common root. The whole point is that of proving that [math]R(P,Q)[/math] is a polynomial in the coefficients of [math]P,Q[/math], with integer coefficients. But this can be checked as follows:
(1) We can expand the formula of [math]R(P,Q)[/math], and in what regards [math]a_1,\ldots,a_k[/math], which are the roots of [math]P[/math], we obtain in this way certain symmetric functions in these variables, which will be therefore polynomials in the coefficients of [math]P[/math], with integer coefficients.
(2) We can then look what happens with respect to the remaining variables [math]b_1,\ldots,b_l[/math], which are the roots of [math]Q[/math]. Once again what we have here are certain symmetric functions, and so polynomials in the coefficients of [math]Q[/math], with integer coefficients.
(3) Thus, we are led to the conclusion in the statement, that [math]R(P,Q)[/math] is a polynomial in the coefficients of [math]P,Q[/math], with integer coefficients, and with the remark that the [math]c^ld^k[/math] factor is there for these latter coefficients to be indeed integers, instead of rationals.
All this might seem a bit complicated, so as an illustration, let us work out an example. Consider the case of a polynomial of degree 2, and a polynomial of degree 1:
In order to compute the resultant, let us factorize our polynomials:
The resultant can be then computed as follows, by using the method above:
Finally, observe that [math]R(P,Q)=0[/math] corresponds indeed to the fact that [math]P,Q[/math] have a common root. Indeed, the root of [math]Q[/math] is [math]r=-e/d[/math], and we have:
Thus we have [math]P(r)=0[/math] precisely when [math]R(P,Q)=0[/math], as predicted by Theorem 4.13.
In general now, Theorem 4.13 holds of course, with its formal proof being the one indicated above, using a bit of abstract algebra, and symmetric functions, and exercise for you to work out the details of that proof, along with some further illustrations too.
Regarding now the explicit formula of the resultant [math]R(P,Q)[/math], this is something quite complicated, and there are several methods for dealing with this problem. There is a slight similarity between Theorem 4.13 and the Vandermonde determinants discussed in chapter 2, and we have in fact the following formula for [math]R(P,Q)[/math]:
The resultant of two polynomials, written as
This is something quite clever, due to Sylvester, as follows:
(1) Consider the vector space [math]\mathbb C_k[X][/math] formed by the polynomials of degree [math] \lt k[/math]:
This is a vector space of dimension [math]k[/math], having as basis the monomials [math]1,X,\ldots,X^{k-1}[/math]. Now given polynomials [math]P,Q[/math] as in the statement, consider the following linear map:
(2) Our first claim is that with respect to the standard bases for all the vector spaces involved, namely those consisting of the monomials [math]1,X,X^2,\ldots[/math], the matrix of [math]\Phi[/math] is the matrix in the statement. But this is something which is clear from definitions.
(3) Our second claim is that [math]\det\Phi=0[/math] happens precisely when [math]P,Q[/math] have a common root. Indeed, our polynomials [math]P,Q[/math] having a common root means that we can find [math]A,B[/math] such that [math]AP+BQ=0[/math], and so that [math](A,B)\in\ker\Phi[/math], which reads [math]\det\Phi=0[/math].
(4) Finally, our claim is that we have [math]\det\Phi=R(P,Q)[/math]. But this follows from the uniqueness of the resultant, up to a scalar, and with this uniqueness property being elementary to establish, along the lines of the proof of Theorem 4.13.
In what follows we will not really need the above formula, so let us check now that this formula works indeed. Consider our favorite polynomials, as before:
According to the above result, the resultant should be then, as it should:
Now back to our diagonalization questions, we want to compute [math]R(P,P')[/math], where [math]P[/math] is the characteristic polynomial. So, we need one more piece of theory, as follows:
Given a polynomial [math]P\in\mathbb C[X][/math], written as
This follows from Theorem 4.13, applied with [math]P=Q[/math], with the division by [math]c[/math] being indeed possible, under [math]\mathbb Z[/math], and with the sign being there for various reasons, including the compatibility with some well-known formulae, at small values of [math]N\in\mathbb N[/math].
Observe that, by using Theorem 4.14, we have an explicit formula for the discriminant, as the determinant of a certain associated matrix. There is a lot of theory here, and in order to close the discussion, let us see what happens in degree 2. Here we have:
Thus, the resultant is given by the following formula:
With the normalizations in Theorem 4.15 made, we obtain, as we should:
Now back to our questions, we can upgrade Theorem 4.12, as follows:
For a matrix [math]A\in M_N(\mathbb C)[/math] the following conditions are equivalent,
- The eigenvalues are different, [math]\lambda_i\neq\lambda_j[/math],
- The characteristic polynomial [math]P[/math] has simple roots,
- The discriminant of [math]P[/math] is nonzero, [math]\Delta(P)\neq0[/math],
and in this case, the matrix is diagonalizable.
This is indeed an upgrade of Theorem 4.12, by replacing the condition (3) there with the condition [math]\Delta(P)\neq0[/math], which is something much better, because it is something algorithmic, ultimately requiring only computations of determinants.
As already mentioned before, one can prove that the matrices having distinct eigenvalues are “generic”, and so the above result basically captures the whole situation. We have in fact the following collection of density results, all being very useful:
The following happen, inside [math]M_N(\mathbb C)[/math]:
- The invertible matrices are dense.
- The matrices having distinct eigenvalues are dense.
- The diagonalizable matrices are dense.
These are quite advanced linear algebra results, which can be proved as follows, with the technology that we have so far:
(1) This is clear, intuitively speaking, because the invertible matrices are given by the condition [math]\det A\neq 0[/math]. Thus, the set formed by these matrices appears as the complement of the surface [math]\det A=0[/math], and so must be dense inside [math]M_N(\mathbb C)[/math], as claimed.
(2) Here we can use a similar argument, this time by saying that the set formed by the matrices having distinct eigenvalues appears as the complement of the surface given by [math]\Delta(P_A)=0[/math], and so must be dense inside [math]M_N(\mathbb C)[/math], as claimed.
(3) This follows from (2), via the fact that the matrices having distinct eigenvalues are diagonalizable, that we know from Theorem 4.16. There are of course some other proofs as well, for instance by putting the matrix in Jordan form.
As an application of the above results, and of our methods in general, we can now establish a number of useful and interesting linear algebra results, as follows:
The following happen:
- We have [math]P_{AB}=P_{BA}[/math], for any two matrices [math]A,B\in M_N(\mathbb C)[/math].
- [math]AB,BA[/math] have the same eigenvalues, with the same multiplicities.
- If [math]A[/math] has eigenvalues [math]\lambda_1,\ldots,\lambda_N[/math], then [math]f(A)[/math] has eigenvalues [math]f(\lambda_1),\ldots,f(\lambda_N)[/math].
These results can be deduced by using Theorem 4.17, as follows:
(1) It follows from definitions that the characteristic polynomial of a matrix is invariant under conjugation, in the sense that we have:
Now observe that, when assuming that [math]A[/math] is invertible, we have:
Thus, we have the result when [math]A[/math] is invertible. By using now Theorem 4.17 (1), we conclude that this formula holds for any matrix [math]A[/math], by continuity.
(2) This is a reformulation of (1) above, via the fact that [math]P[/math] encodes the eigenvalues, with multiplicities, which is hard to prove with bare hands.
(3) This is something more informal, the idea being that this is clear for the diagonal matrices [math]D[/math], then for the diagonalizable matrices [math]PDP^{-1}[/math], and finally for all the matrices, by using Theorem 4.17 (3), provided that [math]f[/math] has suitable regularity properties.
We will be back to this, with more details, in chapter 8 below, when doing spectral theory. Now getting away from all this advanced mathematics, let us go back to the main problem raised by the diagonalization procedure, namely the computation of the roots of characteristic polynomials. As a first trivial observation, in degree 2 we have:
The roots of a degree [math]2[/math] polynomial of the form
This is trivial, coming from [math]P=(X-r)(X-s)[/math]. In practice, this is very useful, and this is how calculus teachers are often faster than their students.
In the matrix setting now, the result is as follows:
The eigenvalues of a square matrix
Once again, this is something trivial, coming from Proposition 4.19. As before, this is a well-known calculus teacher trick.
Finally, we have the following result, which is useful as well:
Assume that we have a polynomial as follows, with integer coefficients, and with the leading term being [math]1[/math]:
This is clear, because any integer root [math]c\in\mathbb Z[/math] of our polynomial must satisfy:
But modulo [math]c[/math], this equation simply reads [math]a_0=0[/math], as desired.
Getting now into more conceptual mathematics, the two formulae from Proposition 4.20 are quite interesting, and can be generalized as follows:
The complex eigenvalues of a matrix [math]A\in M_N(\mathbb C)[/math], counted with multiplicities, have the following properties:
- Their sum is the trace.
- Their product is the determinant.
Consider indeed the characteristic polynomial [math]P[/math] of the matrix:
We can factorize this polynomial, by using its [math]N[/math] complex roots, and we obtain:
Thus, we are led to the conclusion in the statement.
Regarding now the intermediate terms, we have here:
Assume that [math]A\in M_N(\mathbb C)[/math] has eigenvalues [math]\lambda_1,\ldots,\lambda_N\in\mathbb C[/math], counted with multiplicities. The basic symmetric functions of these eigenvalues, namely
These results can be proved by doing some algebra, as follows:
(1) Consider indeed the characteristic polynomial [math]P[/math] of the matrix, factorized by using its [math]N[/math] complex roots, taken with multiplicities. By expanding, we obtain:
With the convention [math]c_0=1[/math], we are led to the conclusion in the statement.
(2) This is something standard, coming by doing some abstract algebra. Working out the formulae for the sums of powers [math]d_s=\sum_i\lambda_i^s[/math], at small values of the exponent [math]s\in\mathbb N[/math], is an excellent exercise, which shows how to proceed in general, by recurrence.
4d. Spectral theorems
Let us go back now to the diagonalization question. Here is a key result:
Any matrix [math]A\in M_N(\mathbb C)[/math] which is self-adjoint, [math]A=A^*[/math], is diagonalizable, with the diagonalization being of the following type,
As a first remark, the converse trivially holds, because if we take a matrix of the form [math]A=UDU^*[/math], with [math]U[/math] unitary and [math]D[/math] diagonal and real, then we have:
In the other sense now, assume that [math]A[/math] is self-adjoint, [math]A=A^*[/math]. Our first claim is that the eigenvalues are real. Indeed, assuming [math]Av=\lambda v[/math], we have:
Thus we obtain [math]\lambda\in\mathbb R[/math], as claimed. Our next claim now is that the eigenspaces corresponding to different eigenvalues are pairwise orthogonal. Assume indeed that:
We have then the following computation, using [math]\lambda,\mu\in\mathbb R[/math]:
Thus [math]\lambda\neq\mu[/math] implies [math]v\perp w[/math], as claimed. In order now to finish, it remains to prove that the eigenspaces span [math]\mathbb C^N[/math]. For this purpose, we will use a recurrence method. Let us pick an eigenvector, [math]Av=\lambda v[/math]. Assuming [math]v\perp w[/math], we have:
Thus, if [math]v[/math] is an eigenvector, then the vector space [math]v^\perp[/math] is invariant under [math]A[/math]. In order to do the recurrence, it still remains to prove that the restriction of [math]A[/math] to the vector space [math]v^\perp[/math] is self-adjoint. But this comes from a general property of the self-adjoint matrices, that we will explain now. Our claim is that an arbitary square matrix [math]A[/math] is self-adjoint precisely when the following happens, for any vector [math]v[/math]:
Indeed, the fact that the above scalar product is real is equivalent to:
But this is equivalent, by developing the scalar product, to [math]A=A^*[/math], so our claim is proved. Now back to our questions, it is clear from our self-adjointness criterion above that the restriction of [math]A[/math] to any invariant subspace, and in particular to the subspace [math]v^\perp[/math], is self-adjoint. Thus, we can proceed by recurrence, and we obtain the result.
Let us record as well the real version of the above result:
Any matrix [math]A\in M_N(\mathbb R)[/math] which is symmetric, in the sense that
As before, the converse trivially holds, because if we take a matrix of the form [math]A=UDU^t[/math], with [math]U[/math] orthogonal and [math]D[/math] diagonal and real, then we have [math]A^t=A[/math]. In the other sense now, this follows from Theorem 4.24, and its proof.
As basic examples of self-adjoint matrices, we have the orthogonal projections. The diagonalization result regarding them is as follows:
The matrices [math]P\in M_N(\mathbb C)[/math] which are projections, [math]P^2=P=P^*[/math], are precisely those which diagonalize as follows,
We know that the equation for the projections is as follows:
Thus the eigenvalues are real, and then, by using [math]P^2=P[/math], we obtain:
We therefore obtain [math]\lambda\in\{0,1\}[/math], as claimed, and as a final conclusion here, the diagonalization of the self-adjoint matrices is as follows, with [math]e_i\in \{0,1\}[/math]:
To be more precise, the number of 1 values is the dimension of the image of [math]P[/math], and the number of 0 values is the dimension of space of vectors sent to 0 by [math]P[/math].
In the real case, the result regarding the projections is as follows:
The matrices [math]P\in M_N(\mathbb R)[/math] which are projections,
This follows indeed from Proposition 4.26, and its proof.
An important class of self-adjoint matrices, that we will discuss now, which includes all the projections, are the positive matrices. The general theory here is as follows:
For a matrix [math]A\in M_N(\mathbb C)[/math] the following conditions are equivalent, and if they are satisfied, we say that [math]A[/math] is positive:
- [math]A=B^2[/math], with [math]B=B^*[/math].
- [math]A=CC^*[/math], for some [math]C\in M_N(\mathbb C)[/math].
- [math] \lt Ax,x \gt \geq0[/math], for any vector [math]x\in\mathbb C^N[/math].
- [math]A=A^*[/math], and the eigenvalues are positive, [math]\lambda_i\geq0[/math].
- [math]A=UDU^*[/math], with [math]U\in U_N[/math] and with [math]D\in M_N(\mathbb R_+)[/math] diagonal.
The idea is that the equivalences in the statement basically follow from some elementary computations, with only Theorem 4.24 needed, at some point:
[math](1)\implies(2)[/math] This is clear, because we can take [math]C=B[/math].
[math](2)\implies(3)[/math] This follows from the following computation:
[math](3)\implies(4)[/math] By using the fact that [math] \lt Ax,x \gt [/math] is real, we have:
Thus we have [math]A=A^*[/math], and the remaining assertion, regarding the eigenvalues, follows from the following computation, assuming [math]Ax=\lambda x[/math]:
[math](4)\implies(5)[/math] This follows indeed by using Theorem 4.24.
[math](5)\implies(1)[/math] Assuming [math]A=UDU^*[/math] as in the statement, we can set [math]B=U\sqrt{D}U^*[/math]. Then this matrix [math]B[/math] is self-adjoint, and its square is given by:
Thus, we are led to the conclusion in the statement.
Let us record as well the following technical version of the above result:
For a matrix [math]A\in M_N(\mathbb C)[/math] the following conditions are equivalent, and if they are satisfied, we say that [math]A[/math] is strictly positive:
- [math]A=B^2[/math], with [math]B=B^*[/math], invertible.
- [math]A=CC^*[/math], for some [math]C\in M_N(\mathbb C)[/math] invertible.
- [math] \lt Ax,x \gt \gt 0[/math], for any nonzero vector [math]x\in\mathbb C^N[/math].
- [math]A=A^*[/math], and the eigenvalues are strictly positive, [math]\lambda_i \gt 0[/math].
- [math]A=UDU^*[/math], with [math]U\in U_N[/math] and with [math]D\in M_N(\mathbb R_+^*)[/math] diagonal.
This follows either from Theorem 4.28, by adding the various extra assumptions in the statement, or from the proof of Theorem 4.28, by modifying where needed.
The positive matrices are quite important, for a number of reasons. On one hand, these are the matrices [math]A\in M_N(\mathbb C)[/math] having a square root [math]\sqrt{A}\in M_N(\mathbb C)[/math], as shown by our positivity condition (1). On the other hand, any matrix [math]A\in M_N(\mathbb C)[/math] produces the positive matrix [math]A^*A\in M_N(\mathbb C)[/math], as shown by our positivity condition (2). We can combine these two observations, and we are led to the following construction, for any [math]A\in M_N(\mathbb C)[/math]:
This is something quite interesting, because at [math]N=1[/math] what we have here is the construction of the absolute value of the complex numbers, [math]|z|=\sqrt{z\bar{z}}[/math]. This suggests using the notation [math]|A|=\sqrt{A^*A}[/math], and then looking for a decomposition result of type:
We will be back to this type of decomposition later, called polar decomposition, at the end of the present chapter, after developing some more general theory.
Let us discuss now the case of the unitary matrices. We have here:
Any matrix [math]U\in M_N(\mathbb C)[/math] which is unitary, [math]U^*=U^{-1}[/math], is diagonalizable, with the eigenvalues on [math]\mathbb T[/math]. More precisely we have
As a first remark, the converse trivially holds, because given a matrix of type [math]U=VDV^*[/math], with [math]V\in U_N[/math], and with [math]D\in M_N(\mathbb T)[/math] being diagonal, we have:
Let us prove now the first assertion, stating that the eigenvalues of a unitary matrix [math]U\in U_N[/math] belong to [math]\mathbb T[/math]. Indeed, assuming [math]Uv=\lambda v[/math], we have:
Thus we obtain [math]\lambda\in\mathbb T[/math], as claimed. Our next claim now is that the eigenspaces corresponding to different eigenvalues are pairwise orthogonal. Assume indeed that:
We have then the following computation, using [math]U^*=U^{-1}[/math] and [math]\lambda,\mu\in\mathbb T[/math]:
Thus [math]\lambda\neq\mu[/math] implies [math]v\perp w[/math], as claimed. In order now to finish, it remains to prove that the eigenspaces span [math]\mathbb C^N[/math]. For this purpose, we will use a recurrence method. Let us pick an eigenvector, [math]Uv=\lambda v[/math]. Assuming [math]v\perp w[/math], we have:
Thus, if [math]v[/math] is an eigenvector, then the vector space [math]v^\perp[/math] is invariant under [math]U[/math]. Now since [math]U[/math] is an isometry, so is its restriction to this space [math]v^\perp[/math]. Thus this restriction is a unitary, and so we can proceed by recurrence, and we obtain the result.
Let us record as well the real version of the above result, in a weak form:
Any matrix [math]U\in M_N(\mathbb R)[/math] which is orthogonal, [math]U^t=U^{-1}[/math], is diagonalizable, with the eigenvalues on [math]\mathbb T[/math]. More precisely we have
This follows indeed from Theorem 4.30.
Observe that the above result does not provide us with a complete characterization of the matrices [math]U\in M_N(\mathbb R)[/math] which are orthogonal. To be more precise, the question left is that of understanding when the matrices of type [math]U=VDV^*[/math], with [math]V\in U_N[/math], and with [math]D\in M_N(\mathbb T)[/math] being diagonal, are real, and this is something non-trivial.
Back to generalities, the self-adjoint matrices and the unitary matrices are particular cases of the general notion of a “normal matrix”, and we have here:
Any matrix [math]A\in M_N(\mathbb C)[/math] which is normal, [math]AA^*=A^*A[/math], is diagonalizable, with the diagonalization being of the following type,
As a first remark, the converse trivially holds, because if we take a matrix of the form [math]A=UDU^*[/math], with [math]U[/math] unitary and [math]D[/math] diagonal, then we have:
In the other sense now, this is something more technical. Our first claim is that a matrix [math]A[/math] is normal precisely when the following happens, for any vector [math]v[/math]:
Indeed, the above equality can be written as follows:
But this is equivalent to [math]AA^*=A^*A[/math], by using the polarization identity. Our claim now is that [math]A,A^*[/math] have the same eigenvectors, with conjugate eigenvalues:
Indeed, this follows from the following computation, and from the trivial fact that if [math]A[/math] is normal, then so is any matrix of type [math]A-\lambda 1_N[/math]:
Let us prove now, by using this, that the eigenspaces of [math]A[/math] are pairwise orthogonal. Assuming [math]Av=\lambda v[/math] and [math]Aw=\mu w[/math] with [math]\lambda\neq\mu[/math], we have:
Thus [math]\lambda\neq\mu[/math] implies [math]v\perp w[/math], as claimed. In order to finish now the proof, it remains to prove that the eigenspaces of [math]A[/math] span the whole [math]\mathbb C^N[/math]. This is something that we have already seen for the self-adjoint matrices, and for the unitaries, and we will use here these results, in order to deal with the general normal case. As a first observation, given an arbitrary matrix [math]A[/math], the matrix [math]AA^*[/math] is self-adjoint:
Thus, we can diagonalize this matrix [math]AA^*[/math], as follows, with the passage matrix being a unitary, [math]V\in U_N[/math], and with the diagonal form being real, [math]E\in M_N(\mathbb R)[/math]:
Now observe that, for matrices of type [math]A=UDU^*[/math], which are those that we supposed to deal with, we have [math]V=U,E=D\bar{D}[/math]. In particular, [math]A[/math] and [math]AA^*[/math] have the same eigenspaces. So, this will be our idea, proving that the eigenspaces of [math]AA^*[/math] are eigenspaces of [math]A[/math]. In order to do so, let us pick two eigenvectors [math]v,w[/math] of the matrix [math]AA^*[/math], corresponding to different eigenvalues, [math]\lambda\neq\mu[/math]. The eigenvalue equations are then as follows:
We have the following computation, using the normality condition [math]AA^*=A^*A[/math], and the fact that the eigenvalues of [math]AA^*[/math], and in particular [math]\mu[/math], are real:
We conclude that we have [math] \lt Av,w \gt =0[/math]. But this reformulates as follows:
Now since the eigenspaces of [math]AA^*[/math] are pairwise orthogonal, and span the whole [math]\mathbb C^N[/math], we deduce from this that these eigenspaces are invariant under [math]A[/math]:
But with this result in hand, we can finish. Indeed, we can decompose the problem, and the matrix [math]A[/math] itself, following these eigenspaces of [math]AA^*[/math], which in practice amounts in saying that we can assume that we only have 1 eigenspace. But by rescaling, this is the same as assuming that we have [math]AA^*=1[/math], and with this done, we are now into the unitary case, that we know how to solve, as explained in Theorem 4.30.
Let us discuss now, as a final topic, one more important result, namely the polar decomposition. The idea will be that of writing a formula as follows:
To be more precise, [math]U[/math] will be here a partial isometry, a generalization of the notion of isometry, and the matrix [math]|A|[/math] will be a kind of absolute value of [math]A[/math].
In order to discuss this, let us first discuss the absolute values. We have here:
Given a matrix [math]A\in M_N(\mathbb C)[/math], we can construct a matrix [math]|A|[/math] as follows, by using the fact that [math]A^*A[/math] is diagonalizable, with positive eigenvalues:
Consider indeed the matrix [math]A^*A[/math], which is normal. According to Theorem 4.32, we can diagonalize this matrix as follows, with [math]U\in U_N[/math], and with [math]D[/math] diagonal:
Since we have [math]A^*A\geq0[/math], it follows that we have [math]D\geq0[/math], which means that the entries of [math]D[/math] are real, and positive. Thus we can extract the square root [math]\sqrt{D}[/math], and then set:
Now if we call this latter matrix [math]|A|[/math], we are led to the conclusions in the statement, namely [math]|A|\geq0[/math], and [math]|A|^2=A[/math]. Finally, the last assertion is clear from definitions.
We can now formulate a first polar decomposition result, as follows:
Any invertible matrix [math]A\in M_N(\mathbb C)[/math] decomposes as
This is routine, and follows by comparing the actions of [math]A,|A|[/math] on the vectors [math]v\in\mathbb C^N[/math], and deducing from this the existence of a unitary [math]U\in U_N[/math] as above.
Observe that at [math]N=1[/math] we obtain in this way the usual polar decomposition of the nonzero complex numbers. There are of course many other examples.
More generally now, we have the following result:
Any square matrix [math]A\in M_N(\mathbb C)[/math] decomposes as
Once again, this follows by comparing the actions of [math]A,|A|[/math] on the vectors [math]v\in\mathbb C^N[/math], and deducing from this the existence of a partial isometry [math]U[/math] as above. Alternatively, we can get this from Theorem 4.34, applied on the complement of the 0-eigenvectors.
We will be back to this in chapter 8 below, when doing spectral theory.
Good news, we are done with linear algebra. We have learned many things in the past 100 pages, and our knowledge of the subject is quite decent, and we will stop here. In the remainder of the present book we will be rather looking into applications.
This being said, there are many possible continuations of what we learned. As a first piece of advice, the more linear algebra that you know, the better your mathematics will be, so try reading some more. A good linear algebra book, written by an analyst, is the one by Lax [1]. Another good linear algebra book, or rather book about algebra at large, written this time by an algebraist, is the one by Lang [2]. A more advanced book, which is more than enough for most daily tasks, is the one by Horn and Johnson [3]. So, keep an eye on these books, and have them ready for a quick read, when needed.
General references
Banica, Teo (2024). "Linear algebra and group theory". arXiv:2206.09283 [math.CO].