12b. Matrices, positivity

[math] \newcommand{\mathds}{\mathbb}[/math]

This article was automatically generated from a tex file and may contain conversion errors. If permitted, you may login and edit this article to improve the conversion.

According to what we know so far, the local extrema of a function [math]f:\mathbb R^N\to\mathbb R[/math] come under the assumption [math]f'(x)=0[/math], and depend on the positivity properties of the Hessian matrix [math]f''(x)\in M_N(\mathbb R)[/math]. In view of this, and of the Taylor formula at order 2 in general, the best method in order to deal with a function [math]f:\mathbb R^N\to\mathbb R[/math] satisfying [math]f'(x)=0[/math] is that of changing the basis around [math]x\in\mathbb R^N[/math] as to have the Hessian matrix [math]f''(x)\in M_N(\mathbb R)[/math] diagonalized, if this diagonalization is of course possible. Thus, forgetting now about functions and analysis, we are led into the following linear algebra question: \begin{question} Which symmetric matrices [math]A\in M_N(\mathbb R)[/math] have the property

[[math]] \lt At,t \gt \geq0 [[/math]]

for any [math]t\in\mathbb R^N[/math]? Also, when is a matrix [math]A\in M_N(\mathbb R)[/math] diagonalizable? \end{question} Observe that in the second question we have lifted the assumption that [math]A[/math] is symmetric, and this for good reason. Indeed, besides the local extremum problematics for the functions [math]f:\mathbb R^N\to\mathbb R[/math], discussed above, we have as well the quite interesting problem of studying the functions [math]f:\mathbb R^N\to\mathbb R^N[/math], whose derivatives are square matrices, [math]f'(x)\in M_N(\mathbb R)[/math], waiting to be diagonalized too. So, we are trying here two shoot 2 rabbits at the same time, with our Question 12.10, as formulated above.


Obviously, hunting matters, so time to ask the expert. And the expert says: \begin{cat} In the lack of speed and claws, yes, develop as much linear algebra as you can. Without even caring for applications, these will come naturally. \end{cat} Thanks cat, so this will be our plan for this section, develop as much linear algebra as we can. And for applications, we will most likely leave them to you, reader, for later in life, depending on the precise physics and engineering questions that you will be interested in. And with the advice of course to follow the feline way, relax, and no mercy.


With this plan made, let us go back to the diagonalization question, from chapter 9. We will need here diagonalization results which are far more powerful. We first have:

Theorem

Any matrix [math]A\in M_N(\mathbb C)[/math] which is self-adjoint, [math]A=A^*[/math], is diagonalizable, with the diagonalization being of the following type,

[[math]] A=UDU^* [[/math]]
with [math]U\in U_N[/math], and with [math]D\in M_N(\mathbb R)[/math] diagonal. The converse holds too.


Show Proof

As a first remark, the converse trivially holds, because if we take a matrix of the form [math]A=UDU^*[/math], with [math]U[/math] unitary and [math]D[/math] diagonal and real, then we have:

[[math]] A^* =(UDU^*)^* =UD^*U^* =UDU^* =A [[/math]]


In the other sense now, assume that [math]A[/math] is self-adjoint, [math]A=A^*[/math]. Our first claim is that the eigenvalues are real. Indeed, assuming [math]Av=\lambda v[/math], we have:

[[math]] \begin{eqnarray*} \lambda \lt v,v \gt &=& \lt \lambda v,v \gt \\ &=& \lt Av,v \gt \\ &=& \lt v,Av \gt \\ &=& \lt v,\lambda v \gt \\ &=&\bar{\lambda} \lt v,v \gt \end{eqnarray*} [[/math]]


Thus we obtain [math]\lambda\in\mathbb R[/math], as claimed. Our next claim now is that the eigenspaces corresponding to different eigenvalues are pairwise orthogonal. Assume indeed that:

[[math]] Av=\lambda v\quad,\quad Aw=\mu w [[/math]]


We have then the following computation, using [math]\lambda,\mu\in\mathbb R[/math]:

[[math]] \begin{eqnarray*} \lambda \lt v,w \gt &=& \lt \lambda v,w \gt \\ &=& \lt Av,w \gt \\ &=& \lt v,Aw \gt \\ &=& \lt v,\mu w \gt \\ &=&\mu \lt v,w \gt \end{eqnarray*} [[/math]]


Thus [math]\lambda\neq\mu[/math] implies [math]v\perp w[/math], as claimed. In order now to finish the proof, it remains to prove that the eigenspaces of [math]A[/math] span the whole space [math]\mathbb C^N[/math]. For this purpose, we will use a recurrence method. Let us pick an eigenvector of our matrix:

[[math]] Av=\lambda v [[/math]]


Assuming now that we have a vector [math]w[/math] orthogonal to it, [math]v\perp w[/math], we have:

[[math]] \begin{eqnarray*} \lt Aw,v \gt &=& \lt w,Av \gt \\ &=& \lt w,\lambda v \gt \\ &=&\lambda \lt w,v \gt \\ &=&0 \end{eqnarray*} [[/math]]


Thus, if [math]v[/math] is an eigenvector, then the vector space [math]v^\perp[/math] is invariant under [math]A[/math]. Moreover, since a matrix [math]A[/math] is self-adjoint precisely when [math] \lt Av,v \gt \in\mathbb R[/math] for any vector [math]v\in\mathbb C^N[/math], as one can see by expanding the scalar product, the restriction of [math]A[/math] to the subspace [math]v^\perp[/math] is self-adjoint. Thus, we can proceed by recurrence, and we obtain the result.

Observe that, as a consequence of the above result, that you certainly might have heard of, any symmetric matrix [math]A\in M_N(\mathbb R)[/math] is diagonalizable. In fact, we have:

Proposition

Any matrix [math]A\in M_N(\mathbb R)[/math] which is symmetric, [math]A=A^t[/math], is diagonalizable, with the diagonalization being of the following type,

[[math]] A=UDU^t [[/math]]
with [math]U\in O_N[/math], and with [math]D\in M_N(\mathbb R)[/math] diagonal. The converse holds too.


Show Proof

As before, the converse trivially holds, because if we take a matrix of the form [math]A=UDU^t[/math], with [math]U[/math] orthogonal and [math]D[/math] diagonal and real, then we have [math]A^t=A[/math]. In the other sense now, this follows from Theorem 12.12, and its proof.

As basic examples of self-adjoint matrices, we have the orthogonal projections:

Proposition

The matrices [math]P\in M_N(\mathbb C)[/math] which are projections, [math]P^2=P^*=P[/math], are precisely those which diagonalize as follows,

[[math]] P=UDU^* [[/math]]
with [math]U\in U_N[/math], and with [math]D\in M_N(0,1)[/math] being diagonal.


Show Proof

Since we have [math]P^*=P[/math], by using Theorem 12.12, the eigenvalues must be real. Then, by using [math]P^2=P[/math], assuming that we have [math]Pv=\lambda v[/math], we obtain:

[[math]] \begin{eqnarray*} \lambda \lt v,v \gt &=& \lt \lambda v,v \gt \\ &=& \lt Pv,v \gt \\ &=& \lt P^2v,v \gt \\ &=& \lt Pv,Pv \gt \\ &=& \lt \lambda v,\lambda v \gt \\ &=&\lambda^2 \lt v,v \gt \end{eqnarray*} [[/math]]


We therefore have [math]\lambda\in\{0,1\}[/math], as claimed, and as a final conclusion here, the diagonalization of the projection matrices is as follows, with [math]e_i\in \{0,1\}[/math]:

[[math]] P\sim\begin{pmatrix} e_1\\ &\ddots\\ &&e_N \end{pmatrix} [[/math]]


To be more precise, the number of 1 values is the dimension of the image of [math]P[/math].

In the real case, the result regarding the projections is as follows:

Proposition

The matrices [math]P\in M_N(\mathbb R)[/math] which are projections, [math]P^2=P^t=P[/math], are precisely those which diagonalize as follows,

[[math]] P=UDU^t [[/math]]
with [math]U\in O_N[/math], and with [math]D\in M_N(0,1)[/math] being diagonal.


Show Proof

This follows indeed from Proposition 12.14, and its proof.

An important class of self-adjoint matrices, which includes for instance all the projections, are the positive matrices. The theory here is as follows:

Theorem

For a matrix [math]A\in M_N(\mathbb C)[/math] the following conditions are equivalent, and if they are satisfied, we say that [math]A[/math] is positive:

  • [math]A=B^2[/math], with [math]B=B^*[/math].
  • [math]A=CC^*[/math], for some [math]C\in M_N(\mathbb C)[/math].
  • [math] \lt Ax,x \gt \geq0[/math], for any vector [math]x\in\mathbb C^N[/math].
  • [math]A=A^*[/math], and the eigenvalues are positive, [math]\lambda_i\geq0[/math].
  • [math]A=UDU^*[/math], with [math]U\in U_N[/math] and with [math]D\in M_N(\mathbb R_+)[/math] diagonal.


Show Proof

The idea is that the equivalences in the statement basically follow from some elementary computations, with only Theorem 12.12 needed, at some point:


[math](1)\implies(2)[/math] This is clear, because we can take [math]C=B[/math].


[math](2)\implies(3)[/math] This follows from the following computation:

[[math]] \begin{eqnarray*} \lt Ax,x \gt &=& \lt CC^*x,x \gt \\ &=& \lt C^*x,C^*x \gt \\ &\geq&0 \end{eqnarray*} [[/math]]


[math](3)\implies(4)[/math] By using the fact that [math] \lt Ax,x \gt [/math] is real, we have:

[[math]] \begin{eqnarray*} \lt Ax,x \gt &=& \lt x,A^*x \gt \\ &=& \lt A^*x,x \gt \end{eqnarray*} [[/math]]


Thus we have [math]A=A^*[/math], and the remaining assertion, regarding the eigenvalues, follows from the following computation, assuming [math]Ax=\lambda x[/math]:

[[math]] \begin{eqnarray*} \lt Ax,x \gt &=& \lt \lambda x,x \gt \\ &=&\lambda \lt x,x \gt \\ &\geq&0 \end{eqnarray*} [[/math]]


[math](4)\implies(5)[/math] This follows indeed by using Theorem 12.12.


[math](5)\implies(1)[/math] Assuming [math]A=UDU^*[/math] with [math]U\in U_N[/math], and with [math]D\in M_N(\mathbb R_+)[/math] diagonal, we can set [math]B=U\sqrt{D}U^*[/math]. Then [math]B[/math] is self-adjoint, and its square is given by:

[[math]] \begin{eqnarray*} B^2 &=&U\sqrt{D}U^*\cdot U\sqrt{D}U^*\\ &=&UDU^*\\ &=&A \end{eqnarray*} [[/math]]


Thus, we are led to the conclusion in the statement.

Let us record as well the following technical version of the above result:

Theorem

For a matrix [math]A\in M_N(\mathbb C)[/math] the following conditions are equivalent, and if they are satisfied, we say that [math]A[/math] is strictly positive:

  • [math]A=B^2[/math], with [math]B=B^*[/math], invertible.
  • [math]A=CC^*[/math], for some [math]C\in M_N(\mathbb C)[/math] invertible.
  • [math] \lt Ax,x \gt \gt 0[/math], for any nonzero vector [math]x\in\mathbb C^N[/math].
  • [math]A=A^*[/math], and the eigenvalues are strictly positive, [math]\lambda_i \gt 0[/math].
  • [math]A=UDU^*[/math], with [math]U\in U_N[/math] and with [math]D\in M_N(\mathbb R_+^*)[/math] diagonal.


Show Proof

This follows either from Theorem 12.16, by adding the above various extra assumptions, or from the proof of Theorem 12.16, by modifying where needed.

Let us discuss now the case of the unitary matrices. We have here:

Theorem

Any matrix [math]U\in M_N(\mathbb C)[/math] which is unitary, [math]U^*=U^{-1}[/math], is diagonalizable, with the eigenvalues on [math]\mathbb T[/math]. More precisely we have

[[math]] U=VDV^* [[/math]]
with [math]V\in U_N[/math], and with [math]D\in M_N(\mathbb T)[/math] diagonal. The converse holds too.


Show Proof

As a first remark, the converse trivially holds, because given a matrix of type [math]U=VDV^*[/math], with [math]V\in U_N[/math], and with [math]D\in M_N(\mathbb T)[/math] being diagonal, we have:

[[math]] \begin{eqnarray*} U^* &=&(VDV^*)^*\\ &=&VD^*V^*\\ &=&VD^{-1}V^{-1}\\ &=&(V^*)^{-1}D^{-1}V^{-1}\\ &=&(VDV^*)^{-1}\\ &=&U^{-1} \end{eqnarray*} [[/math]]


Let us prove now the first assertion, stating that the eigenvalues of a unitary matrix [math]U\in U_N[/math] belong to [math]\mathbb T[/math]. Indeed, assuming [math]Uv=\lambda v[/math], we have:

[[math]] \begin{eqnarray*} \lt v,v \gt &=& \lt U^*Uv,v \gt \\ &=& \lt Uv,Uv \gt \\ &=& \lt \lambda v,\lambda v \gt \\ &=&|\lambda|^2 \lt v,v \gt \end{eqnarray*} [[/math]]


Thus we obtain [math]\lambda\in\mathbb T[/math], as claimed. Our next claim now is that the eigenspaces corresponding to different eigenvalues are pairwise orthogonal. Assume indeed that:

[[math]] Uv=\lambda v\quad,\quad Uw=\mu w [[/math]]


We have then the following computation, using [math]U^*=U^{-1}[/math] and [math]\lambda,\mu\in\mathbb T[/math]:

[[math]] \begin{eqnarray*} \lambda \lt v,w \gt &=& \lt \lambda v,w \gt \\ &=& \lt Uv,w \gt \\ &=& \lt v,U^*w \gt \\ &=& \lt v,U^{-1}w \gt \\ &=& \lt v,\mu^{-1}w \gt \\ &=&\mu \lt v,w \gt \end{eqnarray*} [[/math]]


Thus [math]\lambda\neq\mu[/math] implies [math]v\perp w[/math], as claimed. In order now to finish the proof, it remains to prove that the eigenspaces of [math]U[/math] span the whole space [math]\mathbb C^N[/math]. For this purpose, we will use a recurrence method. Let us pick an eigenvector of our matrix:

[[math]] Uv=\lambda v [[/math]]


Assuming that we have a vector [math]w[/math] orthogonal to it, [math]v\perp w[/math], we have:

[[math]] \begin{eqnarray*} \lt Uw,v \gt &=& \lt w,U^*v \gt \\ &=& \lt w,U^{-1}v \gt \\ &=& \lt w,\lambda^{-1}v \gt \\ &=&\lambda \lt w,v \gt \\ &=&0 \end{eqnarray*} [[/math]]


Thus, if [math]v[/math] is an eigenvector, then the vector space [math]v^\perp[/math] is invariant under [math]U[/math]. Now since [math]U[/math] is an isometry, so is its restriction to this space [math]v^\perp[/math]. Thus this restriction is a unitary, and so we can proceed by recurrence, and we obtain the result.

Let us record as well the real version of the above result, in a weak form:

Proposition

Any matrix [math]U\in M_N(\mathbb R)[/math] which is orthogonal, [math]U^t=U^{-1}[/math], is diagonalizable, with the eigenvalues on [math]\mathbb T[/math]. More precisely we have

[[math]] U=VDV^* [[/math]]
with [math]V\in U_N[/math], and with [math]D\in M_N(\mathbb T)[/math] being diagonal.


Show Proof

This follows indeed from Theorem 12.18.

Observe that the above result does not provide us with a complete characterization of the matrices [math]U\in M_N(\mathbb R)[/math] which are orthogonal. To be more precise, the question left is that of understanding when the matrices of type [math]U=VDV^*[/math], with [math]V\in U_N[/math], and with [math]D\in M_N(\mathbb T)[/math] being diagonal, are real, and this is something non-trivial.


As an illustration, for the simplest unitaries that we know, namely the rotations in the real plane, we have the following formula, that we know well from chapter 9:

[[math]] \begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix} =\frac{1}{2}\begin{pmatrix}1&1\\i&-i\end{pmatrix} \begin{pmatrix}e^{-it}&0\\0&e^{it}\end{pmatrix} \begin{pmatrix}1&-i\\1&i\end{pmatrix} [[/math]]


Back to generalities, the self-adjoint matrices and the unitary matrices are particular cases of the general notion of a “normal matrix”, and we have here:

Theorem

Any matrix [math]A\in M_N(\mathbb C)[/math] which is normal, [math]AA^*=A^*A[/math], is diagonalizable, with the diagonalization being of the following type,

[[math]] A=UDU^* [[/math]]
with [math]U\in U_N[/math], and with [math]D\in M_N(\mathbb C)[/math] diagonal. The converse holds too.


Show Proof

As a first remark, the converse trivially holds, because if we take a matrix of the form [math]A=UDU^*[/math], with [math]U[/math] unitary and [math]D[/math] diagonal, then we have:

[[math]] \begin{eqnarray*} AA^* &=&UDU^*\cdot UD^*U^*\\ &=&UDD^*U^*\\ &=&UD^*DU^*\\ &=&UD^*U^*\cdot UDU^*\\ &=&A^*A \end{eqnarray*} [[/math]]


In the other sense now, this is something more technical. Our first claim is that a matrix [math]A[/math] is normal precisely when the following happens, for any vector [math]v[/math]:

[[math]] ||Av||=||A^*v|| [[/math]]


Indeed, the above equality can be written as follows:

[[math]] \lt AA^*v,v \gt = \lt A^*Av,v \gt [[/math]]


But this is equivalent to [math]AA^*=A^*A[/math], by expanding the scalar products. Our claim now is that [math]A,A^*[/math] have the same eigenvectors, with conjugate eigenvalues:

[[math]] Av=\lambda v\implies A^*v=\bar{\lambda}v [[/math]]


Indeed, this follows from the following computation, and from the trivial fact that if [math]A[/math] is normal, then so is any matrix of type [math]A-\lambda 1_N[/math]:

[[math]] \begin{eqnarray*} ||(A^*-\bar{\lambda}1_N)v|| &=&||(A-\lambda 1_N)^*v||\\ &=&||(A-\lambda 1_N)v||\\ &=&0 \end{eqnarray*} [[/math]]


Let us prove now, by using this, that the eigenspaces of [math]A[/math] are pairwise orthogonal. Assume that we have two eigenvectors, corresponding to different eigenvalues, [math]\lambda\neq\mu[/math]:

[[math]] Av=\lambda v\quad,\quad Aw=\mu w [[/math]]


We have the following computation, which shows that [math]\lambda\neq\mu[/math] implies [math]v\perp w[/math]:

[[math]] \begin{eqnarray*} \lambda \lt v,w \gt &=& \lt \lambda v,w \gt \\ &=& \lt Av,w \gt \\ &=& \lt v,A^*w \gt \\ &=& \lt v,\bar{\mu}w \gt \\ &=&\mu \lt v,w \gt \end{eqnarray*} [[/math]]


In order to finish, it remains to prove that the eigenspaces of [math]A[/math] span the whole [math]\mathbb C^N[/math]. This is something that we have already seen for the self-adjoint matrices, and for unitaries, and we will use here these results, in order to deal with the general normal case. As a first observation, given an arbitrary matrix [math]A[/math], the matrix [math]AA^*[/math] is self-adjoint:

[[math]] (AA^*)^*=AA^* [[/math]]


Thus, we can diagonalize this matrix [math]AA^*[/math], as follows, with the passage matrix being a unitary, [math]V\in U_N[/math], and with the diagonal form being real, [math]E\in M_N(\mathbb R)[/math]:

[[math]] AA^*=VEV^* [[/math]]


Now observe that, for matrices of type [math]A=UDU^*[/math], which are those that we supposed to deal with, we have the following formulae:

[[math]] V=U\quad,\quad E=D\bar{D} [[/math]]


In particular, the matrices [math]A[/math] and [math]AA^*[/math] have the same eigenspaces. So, this will be our idea, proving that the eigenspaces of [math]AA^*[/math] are eigenspaces of [math]A[/math]. In order to do so, let us pick two eigenvectors [math]v,w[/math] of the matrix [math]AA^*[/math], corresponding to different eigenvalues, [math]\lambda\neq\mu[/math]. The eigenvalue equations are then as follows:

[[math]] AA^*v=\lambda v\quad,\quad AA^*w=\mu w [[/math]]


We have the following computation, using the normality condition [math]AA^*=A^*A[/math], and the fact that the eigenvalues of [math]AA^*[/math], and in particular [math]\mu[/math], are real:

[[math]] \begin{eqnarray*} \lambda \lt Av,w \gt &=& \lt \lambda Av,w \gt \\ &=& \lt A\lambda v,w \gt \\ &=& \lt AAA^*v,w \gt \\ &=& \lt AA^*Av,w \gt \\ &=& \lt Av,AA^*w \gt \\ &=& \lt Av,\mu w \gt \\ &=&\mu \lt Av,w \gt \end{eqnarray*} [[/math]]


We conclude that we have [math] \lt Av,w \gt =0[/math]. But this reformulates as follows:

[[math]] \lambda\neq\mu\implies A(E_\lambda)\perp E_\mu [[/math]]


Now since the eigenspaces of [math]AA^*[/math] are pairwise orthogonal, and span the whole [math]\mathbb C^N[/math], we deduce from this that these eigenspaces are invariant under [math]A[/math]:

[[math]] A(E_\lambda)\subset E_\lambda [[/math]]


But with this result in hand, we can finish. Indeed, we can decompose the problem, and the matrix [math]A[/math] itself, following these eigenspaces of [math]AA^*[/math], which in practice amounts in saying that we can assume that we only have 1 eigenspace. By rescaling, this is the same as assuming that we have [math]AA^*=1[/math], and so we are now into the unitary case, that we know how to solve, as explained in Theorem 12.18.

As a first application of all this, we have the following result:

Theorem

Given a matrix [math]A\in M_N(\mathbb C)[/math], we can construct a matrix [math]|A|[/math] as follows, by using the fact that [math]A^*A[/math] is diagonalizable, with positive eigenvalues:

[[math]] |A|=\sqrt{A^*A} [[/math]]
This matrix [math]|A|[/math] is then positive, and its square is [math]|A|^2=A^*A[/math]. In the case [math]N=1[/math], we obtain in this way the usual absolute value of the complex numbers.


Show Proof

Consider indeed the matrix [math]A^*A[/math], which is normal. According to Theorem 12.20, we can diagonalize this matrix as follows, with [math]U\in U_N[/math], and with [math]D[/math] diagonal:

[[math]] A=UDU^* [[/math]]


From [math]A^*A\geq0[/math] we obtain [math]D\geq0[/math]. But this means that the entries of [math]D[/math] are real, and positive. Thus we can extract the square root [math]\sqrt{D}[/math], and then set:

[[math]] \sqrt{A^*A}=U\sqrt{D}U^* [[/math]]


Thus, we are basically done. Indeed, if we call this latter matrix [math]|A|[/math], we are led to the conclusions in the statement. Finally, the last assertion is clear from definitions.

We can now formulate a first polar decomposition result, as follows:

Theorem

Any invertible matrix [math]A\in M_N(\mathbb C)[/math] decomposes as

[[math]] A=U|A| [[/math]]
with [math]U\in U_N[/math], and with [math]|A|=\sqrt{A^*A}[/math] as above.


Show Proof

This is routine, and follows by comparing the actions of [math]A,|A|[/math] on the vectors [math]v\in\mathbb C^N[/math], and deducing from this the existence of a unitary [math]U\in U_N[/math] as above.

Observe that at [math]N=1[/math] we obtain in this way the usual polar decomposition of the nonzero complex numbers. More generally now, we have the following result:

Theorem

Any square matrix [math]A\in M_N(\mathbb C)[/math] decomposes as

[[math]] A=U|A| [[/math]]
with [math]U[/math] being a partial isometry, and with [math]|A|=\sqrt{A^*A}[/math] as above.


Show Proof

Again, this follows by comparing the actions of [math]A,|A|[/math] on the vectors [math]v\in\mathbb C^N[/math], and deducing from this the existence of a partial isometry [math]U[/math] as above. Alternatively, we can get this from Theorem 12.22, applied on the complement of the 0-eigenvectors.

So long for advanced linear algebra. There are actually many other decomposition results for the real matrices, quite often in relation with positivity, and the Jordan form too, which are all useful for questions in analysis, via derivatives and Hessians.


Good luck of course in learning all this, when needed later in life, for your various math problems at that time. And always have in mind expert's advice, Cat 12.11.

General references

Banica, Teo (2024). "Calculus and applications". arXiv:2401.00911 [math.CO].