Linear algebra

[math] \newcommand{\mathds}{\mathbb}[/math]

This article was automatically generated from a tex file and may contain conversion errors. If permitted, you may login and edit this article to improve the conversion.

1a. Linear maps

According to various findings in physics, starting with those of Heisenberg from the early 1920s, basic quantum mechanics involves linear operators [math]T:H\to H[/math] from a complex Hilbert space [math]H[/math] to itself. The space [math]H[/math] is typically infinite dimensional, a basic example being the Schrödinger space [math]H=L^2(\mathbb R^3)[/math] of the wave functions [math]\psi:\mathbb R^3\to\mathbb C[/math] of the electron. In fact, in what regards the electron, this space [math]H=L^2(\mathbb R^3)[/math] is basically the correct one, with the only adjustment needed, due to Pauli and others, being that of tensoring with a copy of [math]K=\mathbb C^2[/math], in order to account for the electron spin.


But more on this later. Let us start this book more modestly, as follows: \begin{fact} We are interested in quantum mechanics, taking place in infinite dimensions, but as a main source of inspiration we will have [math]H=\mathbb C^N[/math], with scalar product

[[math]] \lt x,y \gt =\sum_ix_i\bar{y}_i [[/math]]

with the linearity at left being the standard mathematical convention. More specifically, we will be interested in the mathematics of the linear operators [math]T:H\to H[/math]. \end{fact} The point now, that you surely know about, is that the above operators [math]T:H\to H[/math] correspond to the square matrices [math]A\in M_N(\mathbb C)[/math]. Thus, as a preliminary to what we want to do in this book, we need a good knowledge of linear algebra over [math]\mathbb C[/math].


You probably know well linear algebra, but always good to recall this, and this will be the purpose of the present chapter. Let us start with the very basics:

Theorem

The linear maps [math]T:\mathbb C^N\to\mathbb C^N[/math] are in correspondence with the square matrices [math]A\in M_N(\mathbb C)[/math], with the linear map associated to such a matrix being

[[math]] Tx=Ax [[/math]]
and with the matrix associated to a linear map being [math]A_{ij}= \lt Te_j,e_i \gt [/math].


Show Proof

The first assertion is clear, because a linear map [math]T:\mathbb C^N\to\mathbb C^N[/math] must send a vector [math]x\in\mathbb C^N[/math] to a certain vector [math]Tx\in\mathbb C^N[/math], all whose components are linear combinations of the components of [math]x[/math]. Thus, we can write, for certain complex numbers [math]A_{ij}\in\mathbb C[/math]:

[[math]] T\begin{pmatrix} x_1\\ \vdots\\ \vdots\\ x_N \end{pmatrix} =\begin{pmatrix} A_{11}x_1+\ldots+A_{1N}x_N\\ \vdots\\ \vdots\\ A_{N1}x_1+\ldots+A_{NN}x_N \end{pmatrix} [[/math]]


Now the parameters [math]A_{ij}\in\mathbb C[/math] can be regarded as being the entries of a square matrix [math]A\in M_N(\mathbb C)[/math], and with the usual convention for matrix multiplication, we have:

[[math]] Tx=Ax [[/math]]


Regarding the second assertion, with [math]Tx=Ax[/math] as above, if we denote by [math]e_1,\ldots,e_N[/math] the standard basis of [math]\mathbb C^N[/math], then we have the following formula:

[[math]] Te_j =\begin{pmatrix} A_{1j}\\ \vdots\\ \vdots\\ A_{Nj} \end{pmatrix} [[/math]]


But this gives the second formula, [math] \lt Te_j,e_i \gt =A_{ij}[/math], as desired.

Our claim now is that, no matter what we want to do with [math]T[/math] or [math]A[/math], of advanced type, we will run at some point into their adjoints [math]T^*[/math] and [math]A^*[/math], constructed as follows:

Theorem

The adjoint operator [math]T^*:\mathbb C^N\to\mathbb C^N[/math], which is given by

[[math]] \lt Tx,y \gt = \lt x,T^*y \gt [[/math]]
corresponds to the adjoint matrix [math]A^*\in M_N(\mathbb C)[/math], given by

[[math]] (A^*)_{ij}=\bar{A}_{ji} [[/math]]
via the correspondence between linear maps and matrices constructed above.


Show Proof

Given a linear map [math]T:\mathbb C^N\to\mathbb C^N[/math], fix [math]y\in\mathbb C^N[/math], and consider the linear form [math]\varphi(x)= \lt Tx,y \gt [/math]. This form must be as follows, for a certain vector [math]T^*y\in\mathbb C^N[/math]:

[[math]] \varphi(x)= \lt x,T^*y \gt [[/math]]


Thus, we have constructed a map [math]y\to T^*y[/math] as in the statement, which is obviously linear, and that we can call [math]T^*[/math]. Now by taking the vectors [math]x,y\in\mathbb C^N[/math] to be elements of the standard basis of [math]\mathbb C^N[/math], our defining formula for [math]T^*[/math] reads:

[[math]] \lt Te_i,e_j \gt = \lt e_i,T^*e_j \gt [[/math]]


By reversing the scalar product on the right, this formula can be written as:

[[math]] \lt T^*e_j,e_i \gt =\overline{ \lt Te_i,e_j \gt } [[/math]]


But this means that the matrix of [math]T^*[/math] is given by [math](A^*)_{ij}=\bar{A}_{ji}[/math], as desired.

Getting back to our claim, the adjoints [math]*[/math] are indeed ubiquitous, as shown by:

Theorem

The following happen:

  • [math]T(x)=Ux[/math] with [math]U\in M_N(\mathbb C)[/math] is an isometry precisely when [math]U^*=U^{-1}[/math].
  • [math]T(x)=Px[/math] with [math]P\in M_N(\mathbb C)[/math] is a projection precisely when [math]P^2=P^*=P[/math].


Show Proof

Let us first recall that the lengths, or norms, of the vectors [math]x\in\mathbb C^N[/math] can be recovered from the knowledge of the scalar products, as follows:

[[math]] ||x||=\sqrt{ \lt x,x \gt } [[/math]]


Conversely, we can recover the scalar products out of norms, by using the following difficult to remember formula, called complex polarization identity:

[[math]] 4 \lt x,y \gt =||x+y||^2-||x-y||^2+i||x+iy||^2-i||x-iy||^2 [[/math]]


The proof of this latter formula is indeed elementary, as follows:

[[math]] \begin{eqnarray*} &&||x+y||^2-||x-y||^2+i||x+iy||^2-i||x-iy||^2\\ &=&||x||^2+||y||^2-||x||^2-||y||^2+i||x||^2+i||y||^2-i||x||^2-i||y||^2\\ &&+2Re( \lt x,y \gt )+2Re( \lt x,y \gt )+2iIm( \lt x,y \gt )+2iIm( \lt x,y \gt )\\ &=&4 \lt x,y \gt \end{eqnarray*} [[/math]]


Finally, we will use Theorem 1.3, and more specifically the following formula coming from there, valid for any matrix [math]A\in M_N(\mathbb C)[/math] and any two vectors [math]x,y\in\mathbb C^N[/math]:

[[math]] \lt Ax,y \gt = \lt x,A^*y \gt [[/math]]


(1) Given a matrix [math]U\in M_N(\mathbb C)[/math], we have indeed the following equivalences, with the first one coming from the polarization identity, and the other ones being clear:

[[math]] \begin{eqnarray*} ||Ux||=||x|| &\iff& \lt Ux,Uy \gt = \lt x,y \gt \\ &\iff& \lt x,U^*Uy \gt = \lt x,y \gt \\ &\iff&U^*Uy=y\\ &\iff&U^*U=1\\ &\iff&U^*=U^{-1} \end{eqnarray*} [[/math]]


(2) Given a matrix [math]P\in M_N(\mathbb C)[/math], in order for [math]x\to Px[/math] to be an oblique projection, we must have [math]P^2=P[/math]. Now observe that this projection is orthogonal when:

[[math]] \begin{eqnarray*} \lt Px-x,Py \gt =0 &\iff& \lt P^*Px-P^*x,y \gt =0\\ &\iff&P^*Px-P^*x=0\\ &\iff&P^*P-P^*=0\\ &\iff&P^*P=P^* \end{eqnarray*} [[/math]]


The point now is that by conjugating the last formula, we obtain [math]P^*P=P[/math]. Thus we must have [math]P=P^*[/math], and this gives the result.

Summarizing, the linear operators come in pairs [math]T,T^*[/math], and the associated matrices come as well in pairs [math]A,A^*[/math]. This is something quite interesting, philosophically speaking, and will keep this in mind, and come back to it later, on numerous occasions.

1b. Diagonalization

Let us discuss now the diagonalization question for the linear maps and matrices. Again, we will be quite brief here, and for more, we refer to any standard linear algebra book. By the way, there will be some complex analysis involved too, and here we refer to Rudin [1]. Which book of Rudin will be in fact the one and only true prerequisite for reading the present book, but more on references and reading later.


The basic diagonalization theory, formulated in terms of matrices, is as follows:

Proposition

A vector [math]v\in\mathbb C^N[/math] is called eigenvector of [math]A\in M_N(\mathbb C)[/math], with corresponding eigenvalue [math]\lambda[/math], when [math]A[/math] multiplies by [math]\lambda[/math] in the direction of [math]v[/math]:

[[math]] Av=\lambda v [[/math]]
In the case where [math]\mathbb C^N[/math] has a basis [math]v_1,\ldots,v_N[/math] formed by eigenvectors of [math]A[/math], with corresponding eigenvalues [math]\lambda_1,\ldots,\lambda_N[/math], in this new basis [math]A[/math] becomes diagonal, as follows:

[[math]] A\sim\begin{pmatrix}\lambda_1\\&\ddots\\&&\lambda_N\end{pmatrix} [[/math]]
Equivalently, if we denote by [math]D=diag(\lambda_1,\ldots,\lambda_N)[/math] the above diagonal matrix, and by [math]P=[v_1\ldots v_N][/math] the square matrix formed by the eigenvectors of [math]A[/math], we have:

[[math]] A=PDP^{-1} [[/math]]
In this case we say that the matrix [math]A[/math] is diagonalizable.


Show Proof

This is something which is clear, the idea being as follows:


(1) The first assertion is clear, because the matrix which multiplies each basis element [math]v_i[/math] by a number [math]\lambda_i[/math] is precisely the diagonal matrix [math]D=diag(\lambda_1,\ldots,\lambda_N)[/math].


(2) The second assertion follows from the first one, by changing the basis. We can prove this by a direct computation as well, because we have [math]Pe_i=v_i[/math], and so:

[[math]] \begin{eqnarray*} PDP^{-1}v_i &=&PDe_i\\ &=&P\lambda_ie_i\\ &=&\lambda_iPe_i\\ &=&\lambda_iv_i \end{eqnarray*} [[/math]]


Thus, the matrices [math]A[/math] and [math]PDP^{-1}[/math] coincide, as stated.

Let us recall as well that the basic example of a non diagonalizable matrix, over the complex numbers as above, is the following matrix:

[[math]] J=\begin{pmatrix}0&1\\0&0\end{pmatrix} [[/math]]


Indeed, we have [math]J\binom{x}{y}=\binom{y}{0}[/math], so the eigenvectors are the vectors of type [math]\binom{x}{0}[/math], all with eigenvalue [math]0[/math]. Thus, we have not enough eigenvectors for constructing a basis of [math]\mathbb C^2[/math].


In general, in order to study the diagonalization problem, the idea is that the eigenvectors can be grouped into linear spaces, called eigenspaces, as follows:

Theorem

Let [math]A\in M_N(\mathbb C)[/math], and for any eigenvalue [math]\lambda\in\mathbb C[/math] define the corresponding eigenspace as being the vector space formed by the corresponding eigenvectors:

[[math]] E_\lambda=\left\{v\in\mathbb C^N\Big|Av=\lambda v\right\} [[/math]]
These eigenspaces [math]E_\lambda[/math] are then in a direct sum position, in the sense that given vectors [math]v_1\in E_{\lambda_1},\ldots,v_k\in E_{\lambda_k}[/math] corresponding to different eigenvalues [math]\lambda_1,\ldots,\lambda_k[/math], we have:

[[math]] \sum_ic_iv_i=0\implies c_i=0 [[/math]]
In particular we have the following estimate, with sum over all the eigenvalues,

[[math]] \sum_\lambda\dim(E_\lambda)\leq N [[/math]]
and our matrix is diagonalizable precisely when we have equality.


Show Proof

We prove the first assertion by recurrence on [math]k\in\mathbb N[/math]. Assume by contradiction that we have a formula as follows, with the scalars [math]c_1,\ldots,c_k[/math] being not all zero:

[[math]] c_1v_1+\ldots+c_kv_k=0 [[/math]]


By dividing by one of these scalars, we can assume that our formula is:

[[math]] v_k=c_1v_1+\ldots+c_{k-1}v_{k-1} [[/math]]


Now let us apply [math]A[/math] to this vector. On the left we obtain:

[[math]] Av_k =\lambda_kv_k =\lambda_kc_1v_1+\ldots+\lambda_kc_{k-1}v_{k-1} [[/math]]


On the right we obtain something different, as follows:

[[math]] \begin{eqnarray*} A(c_1v_1+\ldots+c_{k-1}v_{k-1}) &=&c_1Av_1+\ldots+c_{k-1}Av_{k-1}\\ &=&c_1\lambda_1v_1+\ldots+c_{k-1}\lambda_{k-1}v_{k-1} \end{eqnarray*} [[/math]]


We conclude from this that the following equality must hold:

[[math]] \lambda_kc_1v_1+\ldots+\lambda_kc_{k-1}v_{k-1}=c_1\lambda_1v_1+\ldots+c_{k-1}\lambda_{k-1}v_{k-1} [[/math]]


On the other hand, we know by recurrence that the vectors [math]v_1,\ldots,v_{k-1}[/math] must be linearly independent. Thus, the coefficients must be equal, at right and at left:

[[math]] \lambda_kc_1=c_1\lambda_1 [[/math]]

[[math]] \vdots [[/math]]

[[math]] \lambda_kc_{k-1}=c_{k-1}\lambda_{k-1} [[/math]]


Now since at least one of the numbers [math]c_i[/math] must be nonzero, from [math]\lambda_kc_i=c_i\lambda_i[/math] we obtain [math]\lambda_k=\lambda_i[/math], which is a contradiction. Thus our proof by recurrence of the first assertion is complete. As for the second assertion, this follows from the first one.

In order to reach now to more advanced results, we can use the characteristic polynomial, which appears via the following fundamental result:

Theorem

Given a matrix [math]A\in M_N(\mathbb C)[/math], consider its characteristic polynomial:

[[math]] P(x)=\det(A-x1_N) [[/math]]
The eigenvalues of [math]A[/math] are then the roots of [math]P[/math]. Also, we have the inequality

[[math]] \dim(E_\lambda)\leq m_\lambda [[/math]]
where [math]m_\lambda[/math] is the multiplicity of [math]\lambda[/math], as root of [math]P[/math].


Show Proof

The first assertion follows from the following computation, using the fact that a linear map is bijective when the determinant of the associated matrix is nonzero:

[[math]] \begin{eqnarray*} \exists v,Av=\lambda v &\iff&\exists v,(A-\lambda 1_N)v=0\\ &\iff&\det(A-\lambda 1_N)=0 \end{eqnarray*} [[/math]]


Regarding now the second assertion, given an eigenvalue [math]\lambda[/math] of our matrix [math]A[/math], consider the dimension [math]d_\lambda=\dim(E_\lambda)[/math] of the corresponding eigenspace. By changing the basis of [math]\mathbb C^N[/math], as for the eigenspace [math]E_\lambda[/math] to be spanned by the first [math]d_\lambda[/math] basis elements, our matrix becomes as follows, with [math]B[/math] being a certain smaller matrix:

[[math]] A\sim\begin{pmatrix}\lambda 1_{d_\lambda}&0\\0&B\end{pmatrix} [[/math]]


We conclude that the characteristic polynomial of [math]A[/math] is of the following form:

[[math]] P_A =P_{\lambda 1_{d_\lambda}}P_B =(\lambda-x)^{d_\lambda}P_B [[/math]]


Thus the multiplicity [math]m_\lambda[/math] of our eigenvalue [math]\lambda[/math], as a root of [math]P[/math], satisfies [math]m_\lambda\geq d_\lambda[/math], and this leads to the conclusion in the statement.

Now recall that we are over [math]\mathbb C[/math], which is something that we have not used yet, in our last two statements. And the point here is that we have the following key result:

Theorem

Any polynomial [math]P\in\mathbb C[X][/math] decomposes as

[[math]] P=c(X-a_1)\ldots (X-a_N) [[/math]]
with [math]c\in\mathbb C[/math] and with [math]a_1,\ldots,a_N\in\mathbb C[/math].


Show Proof

It is enough to prove that [math]P[/math] has one root, and we do this by contradiction. Assume that [math]P[/math] has no roots, and pick a number [math]z\in\mathbb C[/math] where [math]|P|[/math] attains its minimum:

[[math]] |P(z)|=\min_{x\in\mathbb C}|P(x)| \gt 0 [[/math]]

Since [math]Q(t)=P(z+t)-P(z)[/math] is a polynomial which vanishes at [math]t=0[/math], this polynomial must be of the form [math]ct^k[/math] + higher terms, with [math]c\neq0[/math], and with [math]k\geq1[/math] being an integer. We obtain from this that, with [math]t\in\mathbb C[/math] small, we have the following estimate:

[[math]] P(z+t)\simeq P(z)+ct^k [[/math]]


Now let us write [math]t=rw[/math], with [math]r \gt 0[/math] small, and with [math]|w|=1[/math]. Our estimate becomes:

[[math]] P(z+rw)\simeq P(z)+cr^kw^k [[/math]]


Now recall that we have assumed [math]P(z)\neq0[/math]. We can therefore choose [math]w\in\mathbb T[/math] such that [math]cw^k[/math] points in the opposite direction to that of [math]P(z)[/math], and we obtain in this way:

[[math]] |P(z+rw)| \simeq|P(z)+cr^kw^k| =|P(z)|(1-|c|r^k) [[/math]]


Now by choosing [math]r \gt 0[/math] small enough, as for the error in the first estimate to be small, and overcame by the negative quantity [math]-|c|r^k[/math], we obtain from this:

[[math]] |P(z+rw)| \lt |P(z)| [[/math]]


But this contradicts our definition of [math]z\in\mathbb C[/math], as a point where [math]|P|[/math] attains its minimum. Thus [math]P[/math] has a root, and by recurrence it has [math]N[/math] roots, as stated.

Now by putting everything together, we obtain the following result:

Theorem

Given a matrix [math]A\in M_N(\mathbb C)[/math], consider its characteristic polynomial

[[math]] P(X)=\det(A-X1_N) [[/math]]

then factorize this polynomial, by computing the complex roots, with multiplicities,

[[math]] P(X)=(-1)^N(X-\lambda_1)^{n_1}\ldots(X-\lambda_k)^{n_k} [[/math]]
and finally compute the corresponding eigenspaces, for each eigenvalue found:

[[math]] E_i=\left\{v\in\mathbb C^N\Big|Av=\lambda_iv\right\} [[/math]]
The dimensions of these eigenspaces satisfy then the following inequalities,

[[math]] \dim(E_i)\leq n_i [[/math]]
and [math]A[/math] is diagonalizable precisely when we have equality for any [math]i[/math].


Show Proof

This follows by combining Theorem 1.6, Theorem 1.7 and Theorem 1.8. Indeed, the statement is well formulated, thanks to Theorem 1.8. By summing the inequalities [math]\dim(E_\lambda)\leq m_\lambda[/math] from Theorem 1.7, we obtain an inequality as follows:

[[math]] \sum_\lambda\dim(E_\lambda)\leq\sum_\lambda m_\lambda\leq N [[/math]]


On the other hand, we know from Theorem 1.6 that our matrix is diagonalizable when we have global equality. Thus, we are led to the conclusion in the statement.

This was for the main result of linear algebra. There are countless applications of this, and generally speaking, advanced linear algebra consists in building on Theorem 1.9.


In practice, diagonalizing a matrix remains something quite complicated. Let us record a useful algorithmic version of the above result, as follows:

Theorem

The square matrices [math]A\in M_N(\mathbb C)[/math] can be diagonalized as follows:

  • Compute the characteristic polynomial.
  • Factorize the characteristic polynomial.
  • Compute the eigenvectors, for each eigenvalue found.
  • If there are no [math]N[/math] eigenvectors, [math]A[/math] is not diagonalizable.
  • Otherwise, [math]A[/math] is diagonalizable, [math]A=PDP^{-1}[/math].


Show Proof

This is an informal reformulation of Theorem 1.9, with (4) referring to the total number of linearly independent eigenvectors found in (3), and with [math]A=PDP^{-1}[/math] in (5) being the usual diagonalization formula, with [math]P,D[/math] being as before.

As an illustration for all this, which is a must-know computation, we have:

Proposition

The rotation of angle [math]t\in\mathbb R[/math] in the plane diagonalizes as:

[[math]] \begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix} =\frac{1}{2}\begin{pmatrix}1&1\\i&-i\end{pmatrix} \begin{pmatrix}e^{-it}&0\\0&e^{it}\end{pmatrix} \begin{pmatrix}1&-i\\1&i\end{pmatrix} [[/math]]
Over the reals this is impossible, unless [math]t=0,\pi[/math], where the rotation is diagonal.


Show Proof

Observe first that, as indicated, unlike we are in the case [math]t=0,\pi[/math], where our rotation is [math]\pm1_2[/math], our rotation is a “true” rotation, having no eigenvectors in the plane. Fortunately the complex numbers come to the rescue, via the following computation:

[[math]] \begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix}\binom{1}{i} =\binom{\cos t-i\sin t}{i\cos t+\sin t} =e^{-it}\binom{1}{i} [[/math]]


We have as well a second complex eigenvector, coming from:

[[math]] \begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix}\binom{1}{-i} =\binom{\cos t+i\sin t}{-i\cos t+\sin t} =e^{it}\binom{1}{-i} [[/math]]


Thus, we are led to the conclusion in the statement.

1c. Matrix tricks

At the level of basic examples of diagonalizable matrices, we first have the following result, which provides us with the “generic” examples:

Theorem

For a matrix [math]A\in M_N(\mathbb C)[/math] the following conditions are equivalent,

  • The eigenvalues are different, [math]\lambda_i\neq\lambda_j[/math],
  • The characteristic polynomial [math]P[/math] has simple roots,
  • The characteristic polynomial satisfies [math](P,P')=1[/math],
  • The resultant of [math]P,P'[/math] is nonzero, [math]R(P,P')\neq0[/math],
  • The discriminant of [math]P[/math] is nonzero, [math]\Delta(P)\neq0[/math],

and in this case, the matrix is diagonalizable.


Show Proof

The last assertion holds indeed, due to Theorem 1.9. As for the equivalences in the statement, these are all standard, the idea for their proofs, along with some more theory, needed for using in practice the present result, being as follows:


[math](1)\iff(2)[/math] This follows from Theorem 1.9.


[math](2)\iff(3)[/math] This is standard, the double roots of [math]P[/math] being roots of [math]P'[/math].


[math](3)\iff(4)[/math] The idea here is that associated to any two polynomials [math]P,Q[/math] is their resultant [math]R(P,Q)[/math], which checks whether [math]P,Q[/math] have a common root. Let us write:

[[math]] P=c(X-a_1)\ldots(X-a_k) [[/math]]

[[math]] Q=d(X-b_1)\ldots(X-b_l) [[/math]]


We can define then the resultant as being the following quantity:

[[math]] R(P,Q)=c^ld^k\prod_{ij}(a_i-b_j) [[/math]]


The point now, that we will explain as well, is that this is a polynomial in the coefficients of [math]P,Q[/math], with integer coefficients. Indeed, this can be checked as follows:


-- We can expand the formula of [math]R(P,Q)[/math], and in what regards [math]a_1,\ldots,a_k[/math], which are the roots of [math]P[/math], we obtain in this way certain symmetric functions in these variables, which will be therefore polynomials in the coefficients of [math]P[/math], with integer coefficients.


-- We can then look what happens with respect to the remaining variables [math]b_1,\ldots,b_l[/math], which are the roots of [math]Q[/math]. Once again what we have here are certain symmetric functions, and so polynomials in the coefficients of [math]Q[/math], with integer coefficients.


-- Thus, we are led to the above conclusion, that [math]R(P,Q)[/math] is a polynomial in the coefficients of [math]P,Q[/math], with integer coefficients, and with the remark that the [math]c^ld^k[/math] factor is there for these latter coefficients to be indeed integers, instead of rationals.


Alternatively, let us write our two polynomials in usual form, as follows:

[[math]] P=p_kX^k+\ldots+p_1X+p_0 [[/math]]

[[math]] Q=q_lX^l+\ldots+q_1X+q_0 [[/math]]


The corresponding resultant appears then as the determinant of an associated matrix, having size [math]k+l[/math], and having [math]0[/math] coefficients at the blank spaces, as follows:

[[math]] R(P,Q)= \begin{vmatrix} p_k&&&q_l\\ \vdots&\ddots&&\vdots&\ddots\\ p_0&&p_k&q_0&&q_l\\ &\ddots&\vdots&&\ddots&\vdots\\ &&p_0&&&q_0 \end{vmatrix} [[/math]]


[math](4)\iff(5)[/math] Once again this is something standard, the idea here being that the discriminant [math]\Delta(P)[/math] of a polynomial [math]P\in\mathbb C[X][/math] is, modulo scalars, the resultant [math]R(P,P')[/math]. To be more precise, let us write our polynomial as follows:

[[math]] P(X)=cX^N+dX^{N-1}+\ldots [[/math]]


Its discriminant is then defined as being the following quantity:

[[math]] \Delta(P)=\frac{(-1)^{\binom{N}{2}}}{c}R(P,P') [[/math]]


This is a polynomial in the coefficients of [math]P[/math], with integer coefficients, with the division by [math]c[/math] being indeed possible, under [math]\mathbb Z[/math], and with the sign being there for various reasons, including the compatibility with some well-known formulae, at small values of [math]N[/math].

All the above might seem a bit complicated, so as an illustration, let us work out an example. Consider the case of a polynomial of degree 2, and a polynomial of degree 1:

[[math]] P=ax^2+bx+c\quad,\quad Q=dx+e [[/math]]


In order to compute the resultant, let us factorize our polynomials:

[[math]] P=a(x-p)(x-q)\quad,\quad Q=d(x-r) [[/math]]


The resultant can be then computed as follows, by using the two-step method:

[[math]] \begin{eqnarray*} R(P,Q) &=&ad^2(p-r)(q-r)\\ &=&ad^2(pq-(p+q)r+r^2)\\ &=&cd^2+bd^2r+ad^2r^2\\ &=&cd^2-bde+ae^2 \end{eqnarray*} [[/math]]


Observe that [math]R(P,Q)=0[/math] corresponds indeed to the fact that [math]P,Q[/math] have a common root. Indeed, the root of [math]Q[/math] is [math]r=-e/d[/math], and we have:

[[math]] P(r) =\frac{ae^2}{d^2}-\frac{be}{d}+c =\frac{R(P,Q)}{d^2} [[/math]]


We can recover as well the resultant as a determinant, as follows:

[[math]] R(P,Q) =\begin{vmatrix} a&d&0\\ b&e&d\\ c&0&e \end{vmatrix} =ae^2-bde+cd^2 [[/math]]


Finally, in what regards the discriminant, let us see what happens in degree 2. Here we must compute the resultant of the following two polynomials:

[[math]] P=aX^2+bX+c\quad,\quad P'=2aX+b [[/math]]


The resultant is then given by the following formula:

[[math]] \begin{eqnarray*} R(P,P') &=&ab^2-b(2a)b+c(2a)^2\\ &=&4a^2c-ab^2\\ &=&-a(b^2-4ac) \end{eqnarray*} [[/math]]


Now by doing the discriminant normalizations, we obtain, as we should:

[[math]] \Delta(P)=b^2-4ac [[/math]]


As already mentioned, one can prove that the matrices having distinct eigenvalues are “generic”, and so the above result basically captures the whole situation. We have in fact the following collection of density results, which are quite advanced:

Theorem

The following happen, inside [math]M_N(\mathbb C)[/math]:

  • The invertible matrices are dense.
  • The matrices having distinct eigenvalues are dense.
  • The diagonalizable matrices are dense.


Show Proof

These are quite advanced results, which can be proved as follows:


(1) This is clear, intuitively speaking, because the invertible matrices are given by the condition [math]\det A\neq 0[/math]. Thus, the set formed by these matrices appears as the complement of the hypersurface [math]\det A=0[/math], and so must be dense inside [math]M_N(\mathbb C)[/math], as claimed.


(2) Here we can use a similar argument, this time by saying that the set formed by the matrices having distinct eigenvalues appears as the complement of the hypersurface given by [math]\Delta(P_A)=0[/math], and so must be dense inside [math]M_N(\mathbb C)[/math], as claimed.


(3) This follows from (2), via the fact that the matrices having distinct eigenvalues are diagonalizable, that we know from Theorem 1.12. There are of course some other proofs as well, for instance by putting the matrix in Jordan form.

As an application of the above results, and of our methods in general, we have:

Theorem

The following happen:

  • We have [math]P_{AB}=P_{BA}[/math], for any two matrices [math]A,B\in M_N(\mathbb C)[/math].
  • [math]AB,BA[/math] have the same eigenvalues, with the same multiplicities.
  • If [math]A[/math] has eigenvalues [math]\lambda_1,\ldots,\lambda_N[/math], then [math]f(A)[/math] has eigenvalues [math]f(\lambda_1),\ldots,f(\lambda_N)[/math].


Show Proof

These results can be deduced by using Theorem 1.13, as follows:


(1) It follows from definitions that the characteristic polynomial of a matrix is invariant under conjugation, in the sense that we have the following formula:

[[math]] P_C=P_{ACA^{-1}} [[/math]]


Now observe that, when assuming that [math]A[/math] is invertible, we have:

[[math]] AB=A(BA)A^{-1} [[/math]]


Thus, we have the result when [math]A[/math] is invertible. By using now Theorem 1.13 (1), we conclude that this formula holds for any matrix [math]A[/math], by continuity.


(2) This is a reformulation of (1), via the fact that [math]P[/math] encodes the eigenvalues, with multiplicities, which is hard to prove with bare hands.


(3) This is something quite informal, clear for the diagonal matrices [math]D[/math], then for the diagonalizable matrices [math]PDP^{-1}[/math], and finally for all matrices, by using Theorem 1.13 (3), provided that [math]f[/math] has suitable regularity properties. We will be back to this.

Let us go back to the main problem raised by the diagonalization procedure, namely the computation of the roots of characteristic polynomials. We have here:

Theorem

The complex eigenvalues of a matrix [math]A\in M_N(\mathbb C)[/math], counted with multiplicities, have the following properties:

  • Their sum is the trace.
  • Their product is the determinant.


Show Proof

Consider indeed the characteristic polynomial [math]P[/math] of the matrix:

[[math]] \begin{eqnarray*} P(X) &=&\det(A-X1_N)\\ &=&(-1)^NX^N+(-1)^{N-1}Tr(A)X^{N-1}+\ldots+\det(A) \end{eqnarray*} [[/math]]


We can factorize this polynomial, by using its [math]N[/math] complex roots, and we obtain:

[[math]] \begin{eqnarray*} P(X) &=&(-1)^N(X-\lambda_1)\ldots(X-\lambda_N)\\ &=&(-1)^NX^N+(-1)^{N-1}\left(\sum_i\lambda_i\right)X^{N-1}+\ldots+\prod_i\lambda_i \end{eqnarray*} [[/math]]


Thus, we are led to the conclusion in the statement.

Regarding now the intermediate terms, we have here:

Theorem

Assume that [math]A\in M_N(\mathbb C)[/math] has eigenvalues [math]\lambda_1,\ldots,\lambda_N\in\mathbb C[/math], counted with multiplicities. The basic symmetric functions of these eigenvalues, namely

[[math]] c_k=\sum_{i_1 \lt \ldots \lt i_k}\lambda_{i_1}\ldots\lambda_{i_k} [[/math]]
are then given by the fact that the characteristic polynomial of the matrix is:

[[math]] P(X)=(-1)^N\sum_{k=0}^N(-1)^kc_kX^k [[/math]]
Moreover, all symmetric functions of the eigenvalues, such as the sums of powers

[[math]] d_s=\lambda_1^s+\ldots+\lambda_N^s [[/math]]
appear as polynomials in these characteristic polynomial coefficients [math]c_k[/math].


Show Proof

These results can be proved by doing some algebra, as follows:


(1) Consider indeed the characteristic polynomial [math]P[/math] of the matrix, factorized by using its [math]N[/math] complex roots, taken with multiplicities. By expanding, we obtain:

[[math]] \begin{eqnarray*} P(X) &=&(-1)^N(X-\lambda_1)\ldots(X-\lambda_N)\\ &=&(-1)^NX^N+(-1)^{N-1}\left(\sum_i\lambda_i\right)X^{N-1}+\ldots+\prod_i\lambda_i\\ &=&(-1)^NX^N+(-1)^{N-1}c_1X^{N-1}+\ldots+(-1)^0c_N\\ &=&(-1)^N\left(X^N-c_1X^{N-1}+\ldots+(-1)^Nc_N\right) \end{eqnarray*} [[/math]]


With the convention [math]c_0=1[/math], we are led to the conclusion in the statement.


(2) This is something standard, coming by doing some abstract algebra. Working out the formulae for the sums of powers [math]d_s=\sum_i\lambda_i^s[/math], at small values of the exponent [math]s\in\mathbb N[/math], is an excellent exercise, which shows how to proceed in general, by recurrence.

1d. Spectral theorems

Let us go back now to the diagonalization question. Here is a key result:

Theorem

Any matrix [math]A\in M_N(\mathbb C)[/math] which is self-adjoint, [math]A=A^*[/math], is diagonalizable, with the diagonalization being of the following type,

[[math]] A=UDU^* [[/math]]
with [math]U\in U_N[/math], and with [math]D\in M_N(\mathbb R)[/math] diagonal. The converse holds too.


Show Proof

As a first remark, the converse trivially holds, because if we take a matrix of the form [math]A=UDU^*[/math], with [math]U[/math] unitary and [math]D[/math] diagonal and real, then we have:

[[math]] \begin{eqnarray*} A^* &=&(UDU^*)^*\\ &=&UD^*U^*\\ &=&UDU^*\\ &=&A \end{eqnarray*} [[/math]]


In the other sense now, assume that [math]A[/math] is self-adjoint, [math]A=A^*[/math]. Our first claim is that the eigenvalues are real. Indeed, assuming [math]Av=\lambda v[/math], we have:

[[math]] \begin{eqnarray*} \lambda \lt v,v \gt &=& \lt \lambda v,v \gt \\ &=& \lt Av,v \gt \\ &=& \lt v,Av \gt \\ &=& \lt v,\lambda v \gt \\ &=&\bar{\lambda} \lt v,v \gt \end{eqnarray*} [[/math]]


Thus we obtain [math]\lambda\in\mathbb R[/math], as claimed. Our next claim now is that the eigenspaces corresponding to different eigenvalues are pairwise orthogonal. Assume indeed that:

[[math]] Av=\lambda v\quad,\quad Aw=\mu w [[/math]]


We have then the following computation, using [math]\lambda,\mu\in\mathbb R[/math]:

[[math]] \begin{eqnarray*} \lambda \lt v,w \gt &=& \lt \lambda v,w \gt \\ &=& \lt Av,w \gt \\ &=& \lt v,Aw \gt \\ &=& \lt v,\mu w \gt \\ &=&\mu \lt v,w \gt \end{eqnarray*} [[/math]]


Thus [math]\lambda\neq\mu[/math] implies [math]v\perp w[/math], as claimed. In order now to finish the proof, it remains to prove that the eigenspaces of [math]A[/math] span the whole space [math]\mathbb C^N[/math]. For this purpose, we will use a recurrence method. Let us pick an eigenvector of our matrix:

[[math]] Av=\lambda v [[/math]]


Assuming now that we have a vector [math]w[/math] orthogonal to it, [math]v\perp w[/math], we have:

[[math]] \begin{eqnarray*} \lt Aw,v \gt &=& \lt w,Av \gt \\ &=& \lt w,\lambda v \gt \\ &=&\lambda \lt w,v \gt \\ &=&0 \end{eqnarray*} [[/math]]


Thus, if [math]v[/math] is an eigenvector, then the vector space [math]v^\perp[/math] is invariant under [math]A[/math]. Moreover, since a matrix [math]A[/math] is self-adjoint precisely when [math] \lt Av,v \gt \in\mathbb R[/math] for any vector [math]v\in\mathbb C^N[/math], as one can see by expanding the scalar product, the restriction of [math]A[/math] to the subspace [math]v^\perp[/math] is self-adjoint. Thus, we can proceed by recurrence, and we obtain the result.

As basic examples of self-adjoint matrices, we have the orthogonal projections. The diagonalization result regarding them is as follows:

Proposition

The matrices [math]P\in M_N(\mathbb C)[/math] which are projections,

[[math]] P^2=P^*=P [[/math]]
are precisely those which diagonalize as follows,

[[math]] P=UDU^* [[/math]]
with [math]U\in U_N[/math], and with [math]D\in M_N(0,1)[/math] being diagonal.


Show Proof

The equation for the projections being [math]P^2=P^*=P[/math], the eigenvalues [math]\lambda[/math] are real, and we have as well the following condition, coming from [math]P^2=P[/math]:

[[math]] \begin{eqnarray*} \lambda \lt v,v \gt &=& \lt \lambda v,v \gt \\ &=& \lt Pv,v \gt \\ &=& \lt P^2v,v \gt \\ &=& \lt Pv,Pv \gt \\ &=& \lt \lambda v,\lambda v \gt \\ &=&\lambda^2 \lt v,v \gt \end{eqnarray*} [[/math]]


Thus we obtain [math]\lambda\in\{0,1\}[/math], as claimed, and as a final conclusion here, the diagonalization of the self-adjoint matrices is as follows, with [math]e_i\in \{0,1\}[/math]:

[[math]] P\sim\begin{pmatrix} e_1\\ &\ddots\\ &&e_N \end{pmatrix} [[/math]]


To be more precise, the number of 1 values is the dimension of the image of [math]P[/math], and the number of 0 values is the dimension of space of vectors sent to 0 by [math]P[/math].

An important class of self-adjoint matrices, which includes for instance all the projections, are the positive matrices. The theory here is as follows:

Theorem

For a matrix [math]A\in M_N(\mathbb C)[/math] the following conditions are equivalent, and if they are satisfied, we say that [math]A[/math] is positive:

  • [math]A=B^2[/math], with [math]B=B^*[/math].
  • [math]A=CC^*[/math], for some [math]C\in M_N(\mathbb C)[/math].
  • [math] \lt Ax,x \gt \geq0[/math], for any vector [math]x\in\mathbb C^N[/math].
  • [math]A=A^*[/math], and the eigenvalues are positive, [math]\lambda_i\geq0[/math].
  • [math]A=UDU^*[/math], with [math]U\in U_N[/math] and with [math]D\in M_N(\mathbb R_+)[/math] diagonal.


Show Proof

The idea is that the equivalences in the statement basically follow from some elementary computations, with only Theorem 1.17 needed, at some point:


[math](1)\implies(2)[/math] This is clear, because we can take [math]C=B[/math].


[math](2)\implies(3)[/math] This follows from the following computation:

[[math]] \begin{eqnarray*} \lt Ax,x \gt &=& \lt CC^*x,x \gt \\ &=& \lt C^*x,C^*x \gt \\ &\geq&0 \end{eqnarray*} [[/math]]


[math](3)\implies(4)[/math] By using the fact that [math] \lt Ax,x \gt [/math] is real, we have:

[[math]] \begin{eqnarray*} \lt Ax,x \gt &=& \lt x,A^*x \gt \\ &=& \lt A^*x,x \gt \end{eqnarray*} [[/math]]


Thus we have [math]A=A^*[/math], and the remaining assertion, regarding the eigenvalues, follows from the following computation, assuming [math]Ax=\lambda x[/math]:

[[math]] \begin{eqnarray*} \lt Ax,x \gt &=& \lt \lambda x,x \gt \\ &=&\lambda \lt x,x \gt \\ &\geq&0 \end{eqnarray*} [[/math]]


[math](4)\implies(5)[/math] This follows indeed by using Theorem 1.17.


[math](5)\implies(1)[/math] Assuming [math]A=UDU^*[/math], with [math]U\in U_N[/math], and with [math]D\in M_N(\mathbb R_+)[/math] being diagonal, we can set [math]B=U\sqrt{D}U^*[/math]. Then [math]B[/math] is self-adjoint, and its square is given by:

[[math]] \begin{eqnarray*} B^2 &=&U\sqrt{D}U^*\cdot U\sqrt{D}U^*\\ &=&UDU^*\\ &=&A \end{eqnarray*} [[/math]]


Thus, we are led to the conclusion in the statement.

Let us record as well the following technical version of the above result:

Theorem

For a matrix [math]A\in M_N(\mathbb C)[/math] the following conditions are equivalent, and if they are satisfied, we say that [math]A[/math] is strictly positive:

  • [math]A=B^2[/math], with [math]B=B^*[/math], invertible.
  • [math]A=CC^*[/math], for some [math]C\in M_N(\mathbb C)[/math] invertible.
  • [math] \lt Ax,x \gt \gt 0[/math], for any nonzero vector [math]x\in\mathbb C^N[/math].
  • [math]A=A^*[/math], and the eigenvalues are strictly positive, [math]\lambda_i \gt 0[/math].
  • [math]A=UDU^*[/math], with [math]U\in U_N[/math] and with [math]D\in M_N(\mathbb R_+^*)[/math] diagonal.


Show Proof

This follows either from Theorem 1.19, by adding the various extra assumptions in the statement, or from the proof of Theorem 1.19, by modifying where needed.

Let us discuss now the case of the unitary matrices. We have here:

Theorem

Any matrix [math]U\in M_N(\mathbb C)[/math] which is unitary, [math]U^*=U^{-1}[/math], is diagonalizable, with the eigenvalues on [math]\mathbb T[/math]. More precisely we have

[[math]] U=VDV^* [[/math]]
with [math]V\in U_N[/math], and with [math]D\in M_N(\mathbb T)[/math] diagonal. The converse holds too.


Show Proof

As a first remark, the converse trivially holds, because given a matrix of type [math]U=VDV^*[/math], with [math]V\in U_N[/math], and with [math]D\in M_N(\mathbb T)[/math] being diagonal, we have:

[[math]] \begin{eqnarray*} U^* &=&(VDV^*)^*\\ &=&VD^*V^*\\ &=&VD^{-1}V^{-1}\\ &=&(V^*)^{-1}D^{-1}V^{-1}\\ &=&(VDV^*)^{-1}\\ &=&U^{-1} \end{eqnarray*} [[/math]]


Let us prove now the first assertion, stating that the eigenvalues of a unitary matrix [math]U\in U_N[/math] belong to [math]\mathbb T[/math]. Indeed, assuming [math]Uv=\lambda v[/math], we have:

[[math]] \begin{eqnarray*} \lt v,v \gt &=& \lt U^*Uv,v \gt \\ &=& \lt Uv,Uv \gt \\ &=& \lt \lambda v,\lambda v \gt \\ &=&|\lambda|^2 \lt v,v \gt \end{eqnarray*} [[/math]]


Thus we obtain [math]\lambda\in\mathbb T[/math], as claimed. Our next claim now is that the eigenspaces corresponding to different eigenvalues are pairwise orthogonal. Assume indeed that:

[[math]] Uv=\lambda v\quad,\quad Uw=\mu w [[/math]]


We have then the following computation, using [math]U^*=U^{-1}[/math] and [math]\lambda,\mu\in\mathbb T[/math]:

[[math]] \begin{eqnarray*} \lambda \lt v,w \gt &=& \lt \lambda v,w \gt \\ &=& \lt Uv,w \gt \\ &=& \lt v,U^*w \gt \\ &=& \lt v,U^{-1}w \gt \\ &=& \lt v,\mu^{-1}w \gt \\ &=&\mu \lt v,w \gt \end{eqnarray*} [[/math]]


Thus [math]\lambda\neq\mu[/math] implies [math]v\perp w[/math], as claimed. In order now to finish the proof, it remains to prove that the eigenspaces of [math]U[/math] span the whole space [math]\mathbb C^N[/math]. For this purpose, we will use a recurrence method. Let us pick an eigenvector of our matrix:

[[math]] Uv=\lambda v [[/math]]


Assuming that we have a vector [math]w[/math] orthogonal to it, [math]v\perp w[/math], we have:

[[math]] \begin{eqnarray*} \lt Uw,v \gt &=& \lt w,U^*v \gt \\ &=& \lt w,U^{-1}v \gt \\ &=& \lt w,\lambda^{-1}v \gt \\ &=&\lambda \lt w,v \gt \\ &=&0 \end{eqnarray*} [[/math]]


Thus, if [math]v[/math] is an eigenvector, then the vector space [math]v^\perp[/math] is invariant under [math]U[/math]. Now since [math]U[/math] is an isometry, so is its restriction to this space [math]v^\perp[/math]. Thus this restriction is a unitary, and so we can proceed by recurrence, and we obtain the result.

The self-adjoint matrices and the unitary matrices are particular cases of the general notion of a “normal matrix”, and we have here:

Theorem

Any matrix [math]A\in M_N(\mathbb C)[/math] which is normal, [math]AA^*=A^*A[/math], is diagonalizable, with the diagonalization being of the following type,

[[math]] A=UDU^* [[/math]]
with [math]U\in U_N[/math], and with [math]D\in M_N(\mathbb C)[/math] diagonal. The converse holds too.


Show Proof

As a first remark, the converse trivially holds, because if we take a matrix of the form [math]A=UDU^*[/math], with [math]U[/math] unitary and [math]D[/math] diagonal, then we have:

[[math]] \begin{eqnarray*} AA^* &=&UDU^*\cdot UD^*U^*\\ &=&UDD^*U^*\\ &=&UD^*DU^*\\ &=&UD^*U^*\cdot UDU^*\\ &=&A^*A \end{eqnarray*} [[/math]]


In the other sense now, this is something more technical. Our first claim is that a matrix [math]A[/math] is normal precisely when the following happens, for any vector [math]v[/math]:

[[math]] ||Av||=||A^*v|| [[/math]]


Indeed, the above equality can be written as follows:

[[math]] \lt AA^*v,v \gt = \lt A^*Av,v \gt [[/math]]


But this is equivalent to [math]AA^*=A^*A[/math], by expanding the scalar products. Our next claim is that [math]A,A^*[/math] have the same eigenvectors, with conjugate eigenvalues:

[[math]] Av=\lambda v\implies A^*v=\bar{\lambda}v [[/math]]


Indeed, this follows from the following computation, and from the trivial fact that if [math]A[/math] is normal, then so is any matrix of type [math]A-\lambda 1_N[/math]:

[[math]] \begin{eqnarray*} ||(A^*-\bar{\lambda}1_N)v|| &=&||(A-\lambda 1_N)^*v||\\ &=&||(A-\lambda 1_N)v||\\ &=&0 \end{eqnarray*} [[/math]]


Let us prove now, by using this, that the eigenspaces of [math]A[/math] are pairwise orthogonal. Assume that we have two eigenvectors, corresponding to different eigenvalues, [math]\lambda\neq\mu[/math]:

[[math]] Av=\lambda v\quad,\quad Aw=\mu w [[/math]]


We have the following computation, which shows that [math]\lambda\neq\mu[/math] implies [math]v\perp w[/math]:

[[math]] \begin{eqnarray*} \lambda \lt v,w \gt &=& \lt \lambda v,w \gt \\ &=& \lt Av,w \gt \\ &=& \lt v,A^*w \gt \\ &=& \lt v,\bar{\mu}w \gt \\ &=&\mu \lt v,w \gt \end{eqnarray*} [[/math]]


In order to finish, it remains to prove that the eigenspaces of [math]A[/math] span the whole [math]\mathbb C^N[/math]. This is something that we have already seen for the self-adjoint matrices, and for unitaries, and we will use here these results, in order to deal with the general normal case. As a first observation, given an arbitrary matrix [math]A[/math], the matrix [math]AA^*[/math] is self-adjoint:

[[math]] (AA^*)^*=AA^* [[/math]]


Thus, we can diagonalize this matrix [math]AA^*[/math], as follows, with the passage matrix being a unitary, [math]V\in U_N[/math], and with the diagonal form being real, [math]E\in M_N(\mathbb R)[/math]:

[[math]] AA^*=VEV^* [[/math]]


Now observe that, for matrices of type [math]A=UDU^*[/math], which are those that we supposed to deal with, we have the following formulae:

[[math]] V=U\quad,\quad E=D\bar{D} [[/math]]


In particular, the matrices [math]A[/math] and [math]AA^*[/math] have the same eigenspaces. So, this will be our idea, proving that the eigenspaces of [math]AA^*[/math] are eigenspaces of [math]A[/math]. In order to do so, let us pick two eigenvectors [math]v,w[/math] of the matrix [math]AA^*[/math], corresponding to different eigenvalues, [math]\lambda\neq\mu[/math]. The eigenvalue equations are then as follows:

[[math]] AA^*v=\lambda v\quad,\quad AA^*w=\mu w [[/math]]


We have the following computation, using the normality condition [math]AA^*=A^*A[/math], and the fact that the eigenvalues of [math]AA^*[/math], and in particular [math]\mu[/math], are real:

[[math]] \begin{eqnarray*} \lambda \lt Av,w \gt &=& \lt \lambda Av,w \gt \\ &=& \lt A\lambda v,w \gt \\ &=& \lt AAA^*v,w \gt \\ &=& \lt AA^*Av,w \gt \\ &=& \lt Av,AA^*w \gt \\ &=& \lt Av,\mu w \gt \\ &=&\mu \lt Av,w \gt \end{eqnarray*} [[/math]]


We conclude that we have [math] \lt Av,w \gt =0[/math]. But this reformulates as follows:

[[math]] \lambda\neq\mu\implies A(E_\lambda)\perp E_\mu [[/math]]


Now since the eigenspaces of [math]AA^*[/math] are pairwise orthogonal, and span the whole [math]\mathbb C^N[/math], we deduce from this that these eigenspaces are invariant under [math]A[/math]:

[[math]] A(E_\lambda)\subset E_\lambda [[/math]]


But with this result in hand, we can finish. Indeed, we can decompose the problem, and the matrix [math]A[/math] itself, following these eigenspaces of [math]AA^*[/math], which in practice amounts in saying that we can assume that we only have 1 eigenspace. Now by rescaling, this is the same as assuming that we have [math]AA^*=1[/math]. But with this, we are now into the unitary case, that we know how to solve, as explained in Theorem 1.21, and so done.

As a first application, we have the following result:

Theorem

Given a matrix [math]A\in M_N(\mathbb C)[/math], we can construct a matrix [math]|A|[/math] as follows, by using the fact that [math]A^*A[/math] is diagonalizable, with positive eigenvalues:

[[math]] |A|=\sqrt{A^*A} [[/math]]
This matrix [math]|A|[/math] is then positive, and its square is [math]|A|^2=A^*A[/math]. In the case [math]N=1[/math], we obtain in this way the usual absolute value of the complex numbers.


Show Proof

Consider indeed the matrix [math]A^*A[/math], which is normal. According to Theorem 1.22, we can diagonalize this matrix as follows, with [math]U\in U_N[/math], and with [math]D[/math] diagonal:

[[math]] A=UDU^* [[/math]]


From [math]A^*A\geq0[/math] we obtain [math]D\geq0[/math]. But this means that the entries of [math]D[/math] are real, and positive. Thus we can extract the square root [math]\sqrt{D}[/math], and then set:

[[math]] \sqrt{A^*A}=U\sqrt{D}U^* [[/math]]


Thus, we are basically done. Indeed, if we call this latter matrix [math]|A|[/math], then we are led to the conclusions in the statement. Finally, the last assertion is clear from definitions.

We can now formulate a first polar decomposition result, as follows:

Theorem

Any invertible matrix [math]A\in M_N(\mathbb C)[/math] decomposes as

[[math]] A=U|A| [[/math]]
with [math]U\in U_N[/math], and with [math]|A|=\sqrt{A^*A}[/math] as above.


Show Proof

This is routine, and follows by comparing the actions of [math]A,|A|[/math] on the vectors [math]v\in\mathbb C^N[/math], and deducing from this the existence of a unitary [math]U\in U_N[/math] as above. We will be back to this, later on, directly in the case of the linear operators on Hilbert spaces.

Observe that at [math]N=1[/math] we obtain in this way the usual polar decomposition of the nonzero complex numbers. More generally now, we have the following result:

Theorem

Any square matrix [math]A\in M_N(\mathbb C)[/math] decomposes as

[[math]] A=U|A| [[/math]]
with [math]U[/math] being a partial isometry, and with [math]|A|=\sqrt{A^*A}[/math] as above.


Show Proof

Again, this follows by comparing the actions of [math]A,|A|[/math] on the vectors [math]v\in\mathbb C^N[/math], and deducing from this the existence of a partial isometry [math]U[/math] as above. Alternatively, we can get this from Theorem 1.24, applied on the complement of the 0-eigenvectors.

This was for our basic presentation of linear algebra. There are of course many other things that can be said, but we will come back to some of them in what follows, directly in the case of the linear operators on the arbitrary Hilbert spaces.

General references

Banica, Teo (2024). "Principles of operator algebras". arXiv:2208.03600 [math.OA].

References

  1. W. Rudin, Real and complex analysis, McGraw-Hill (1966).