Linear maps

[math] \newcommand{\mathds}{\mathbb}[/math]

This article was automatically generated from a tex file and may contain conversion errors. If permitted, you may login and edit this article to improve the conversion.

9a. Linear maps

In the remainder of this book we study the functions of several variables. Things here are quite tricky, and we will do this slowly, over a whole 200 pages. Let us start with the linear functions, which are the simplest. We have here the following result:

Theorem

The linear maps [math]f:\mathbb R^N\to\mathbb R^M[/math] are in correspondence with the matrices [math]A\in M_{M\times N}(\mathbb R)[/math], with the linear map associated to such a matrix being

[[math]] f(x)=Ax [[/math]]
and with the matrix associated to a linear map being [math]A_{ij}= \lt f(e_j),e_i \gt [/math]. Similarly, the linear maps [math]f:\mathbb C^N\to\mathbb C^N[/math] are in correspondence with the matrices [math]A\in M_{M\times N}(\mathbb C)[/math].


Show Proof

The first assertion is clear, because a linear map [math]f:\mathbb R^N\to\mathbb R^M[/math] must send a vector [math]x\in\mathbb R^N[/math] to a certain vector [math]f(x)\in\mathbb R^M[/math], all whose components are linear combinations of the components of [math]x[/math]. Thus, we can write, for certain numbers [math]A_{ij}\in\mathbb R[/math]:

[[math]] f\begin{pmatrix} x_1\\ \vdots\\ x_N \end{pmatrix} =\begin{pmatrix} A_{11}x_1+\ldots+A_{1N}x_N\\ \vdots\\ A_{M1}x_1+\ldots+A_{MN}x_N \end{pmatrix} [[/math]]


Now the parameters [math]A_{ij}\in\mathbb R[/math] can be regarded as being the entries of a rectangular matrix [math]A\in M_{M\times N}(\mathbb R)[/math], and with the usual convention for the rectangular matrix multiplication, the above formula is precisely the one in the statement, namely:

[[math]] f(x)=Ax [[/math]]


Regarding the second assertion, with [math]f(x)=Ax[/math] as above, if we denote by [math]e_1,\ldots,e_N[/math] the standard basis of [math]\mathbb R^N[/math], then we have the following formula:

[[math]] f(e_j) =\begin{pmatrix} A_{1j}\\ \vdots\\ A_{Mj} \end{pmatrix} [[/math]]


But this gives [math] \lt f(e_j),e_i \gt =A_{ij}[/math], as desired. As for the last assertion, regarding the complex maps and matrices, the proof here is similar, and with the complex scalar products being by definition given, here and in what follows, by [math] \lt x,y \gt =\sum_ix_i\bar{y}_i[/math].

At the level of examples, let us first discuss the linear maps [math]f:\mathbb R^2\to\mathbb R^2[/math]. We have:

Proposition

The rotation of angle [math]t\in\mathbb R[/math], and the symmetry with respect to the [math]Ox[/math] axis rotated by an angle [math]t/2\in\mathbb R[/math], are given by the matrices

[[math]] R_t=\begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix}\quad,\quad S_t=\begin{pmatrix}\cos t&\sin t\\ \sin t&-\cos t\end{pmatrix} [[/math]]
both depending on [math]t\in\mathbb R[/math] taken modulo [math]2\pi[/math].


Show Proof

The rotation being linear, it must correspond to a certain matrix:

[[math]] R_t=\begin{pmatrix}a&b\\ c&d\end{pmatrix} [[/math]]


We can guess this matrix, via its action on the basic coordinate vectors [math]\binom{1}{0}[/math] and [math]\binom{0}{1}[/math]. Indeed, a quick picture in the plane shows that we must have:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix}\begin{pmatrix}1\\ 0\end{pmatrix}= \begin{pmatrix}\cos t\\ \sin t\end{pmatrix}\quad,\quad \begin{pmatrix}a&b\\ c&d\end{pmatrix}\begin{pmatrix}0\\ 1\end{pmatrix}= \begin{pmatrix}-\sin t\\ \cos t\end{pmatrix} [[/math]]


Guessing now the matrix is not complicated, because the first equality gives us the first column, and the second equality gives us the second column:

[[math]] \binom{a}{c}=\begin{pmatrix}\cos t\\ \sin t\end{pmatrix}\quad,\quad \binom{b}{d}=\begin{pmatrix}-\sin t\\ \cos t\end{pmatrix} [[/math]]


Thus, we can just put together these two vectors, and we obtain our matrix [math]R_t[/math]. As for the symmetry, the proof here is similar, again by computing [math]S_t\binom{1}{0}[/math] and [math]S_t\binom{0}{1}[/math].

Let us record as well a result regarding the projections, as follows:

Proposition

The projection on the [math]Ox[/math] axis rotated by an angle [math]t/2\in\mathbb R[/math] is

[[math]] P_t=\frac{1}{2}\begin{pmatrix}1+\cos t&\sin t\\ \sin t&1-\cos t\end{pmatrix} [[/math]]
depending on [math]t\in\mathbb R[/math] taken modulo [math]2\pi[/math].


Show Proof

A quick picture in the plane, using similarity of triangles, and the basic trigonometry formulae for the duplication of angles, show that we must have:

[[math]] P_t\begin{pmatrix}1\\ 0\end{pmatrix} =\cos\frac{t}{2}\binom{\cos\frac{t}{2}}{\sin\frac{t}{2}} =\frac{1}{2}\begin{pmatrix}1+\cos t\\ \sin t\end{pmatrix} [[/math]]


Similarly, another quick picture plus trigonometry show that we must have:

[[math]] P_t\begin{pmatrix}0\\ 1\end{pmatrix} =\sin\frac{t}{2}\binom{\cos\frac{t}{2}}{\sin\frac{t}{2}} =\frac{1}{2}\begin{pmatrix}\sin t\\1-\cos t\end{pmatrix} [[/math]]


Now by putting together these two vectors, and we obtain our matrix.

Back to Theorem 9.1, our claim is that, no matter what we want to do with [math]f[/math] or [math]A[/math], we will run at some point into their adjoints [math]f^*[/math] and [math]A^*[/math], constructed as follows:

Theorem

The adjoint linear map [math]f^*:\mathbb C^N\to\mathbb C^N[/math], which is given by

[[math]] \lt f(x),y \gt = \lt x,f^*(y) \gt [[/math]]
corresponds to the adjoint matrix [math]A^*\in M_N(\mathbb C)[/math], given by

[[math]] (A^*)_{ij}=\bar{A}_{ji} [[/math]]
via the correspondence between linear maps and matrices constructed above.


Show Proof

Given a linear map [math]f:\mathbb C^N\to\mathbb C^N[/math], fix [math]y\in\mathbb C^N[/math], and consider the linear form [math]\varphi(x)= \lt f(x),y \gt [/math]. This form must be as follows, for a certain vector [math]f^*(y)\in\mathbb C^N[/math]:

[[math]] \varphi(x)= \lt x,f^*(y) \gt [[/math]]


Thus, we have constructed a map [math]y\to f^*(y)[/math] as in the statement, which is obviously linear, and that we can call [math]f^*[/math]. Now by taking the vectors [math]x,y\in\mathbb C^N[/math] to be elements of the standard basis of [math]\mathbb C^N[/math], our defining formula for [math]f^*[/math] reads:

[[math]] \lt f(e_i),e_j \gt = \lt e_i,f^*(e_j) \gt [[/math]]


By reversing the scalar product on the right, this formula can be written as:

[[math]] \lt f^*(e_j),e_i \gt =\overline{ \lt f(e_i),e_j \gt } [[/math]]


But this means that the matrix of [math]f^*[/math] is given by [math](A^*)_{ij}=\bar{A}_{ji}[/math], as desired.

Getting back to our claim, the adjoints [math]*[/math] are indeed ubiquitous, as shown by:

Theorem

The following happen:

  • [math]f(x)=Ux[/math] with [math]U\in M_N(\mathbb C)[/math] is an isometry precisely when [math]U^*=U^{-1}[/math].
  • [math]f(x)=Px[/math] with [math]P\in M_N(\mathbb C)[/math] is a projection precisely when [math]P^2=P^*=P[/math].


Show Proof

Let us first recall that the lengths, or norms, of the vectors [math]x\in\mathbb C^N[/math] can be recovered from the knowledge of the scalar products, as follows:

[[math]] ||x||=\sqrt{ \lt x,x \gt } [[/math]]


Conversely, we can recover the scalar products out of norms, by using the following difficult to remember formula, called complex polarization identity:

[[math]] \begin{eqnarray*} &&||x+y||^2-||x-y||^2+i||x+iy||^2-i||x-iy||^2\\ &=&||x||^2+||y||^2-||x||^2-||y||^2+i||x||^2+i||y||^2-i||x||^2-i||y||^2\\ &&+2Re( \lt x,y \gt )+2Re( \lt x,y \gt )+2iIm( \lt x,y \gt )+2iIm( \lt x,y \gt )\\ &=&4 \lt x,y \gt \end{eqnarray*} [[/math]]


Finally, we will use Theorem 9.4, and more specifically the following formula coming from there, valid for any matrix [math]A\in M_N(\mathbb C)[/math] and any two vectors [math]x,y\in\mathbb C^N[/math]:

[[math]] \lt Ax,y \gt = \lt x,A^*y \gt [[/math]]


(1) Given a matrix [math]U\in M_N(\mathbb C)[/math], we have indeed the following equivalences, with the first one coming from the polarization identity, and with the other ones being clear:

[[math]] \begin{eqnarray*} ||Ux||=||x|| &\iff& \lt Ux,Uy \gt = \lt x,y \gt \\ &\iff& \lt x,U^*Uy \gt = \lt x,y \gt \\ &\iff&U^*Uy=y\\ &\iff&U^*U=1\\ &\iff&U^*=U^{-1} \end{eqnarray*} [[/math]]


(2) Given a matrix [math]P\in M_N(\mathbb C)[/math], in order for [math]x\to Px[/math] to be an oblique projection, we must have [math]P^2=P[/math]. Now observe that this projection is orthogonal when:

[[math]] \begin{eqnarray*} \lt Px-x,Py \gt =0 &\iff& \lt P^*Px-P^*x,y \gt =0\\ &\iff&P^*Px-P^*x=0\\ &\iff&P^*P-P^*=0\\ &\iff&P^*P=P^* \end{eqnarray*} [[/math]]


The point now is that by conjugating the last formula, we obtain [math]P^*P=P[/math]. Thus we must have [math]P=P^*[/math], and this gives the result.

Summarizing, the linear operators come in pairs [math]T,T^*[/math], and the associated matrices come as well in pairs [math]A,A^*[/math]. We will keep this in mind, and come back to it later.

9b. Matrix inversion

We have seen so far that most of the interesting maps [math]f:\mathbb R^N\to\mathbb R^N[/math] that we know, such as the rotations, symmetries and projections, are linear, and can be written in the following form, with [math]A\in M_N(\mathbb R)[/math] being a square matrix:

[[math]] f(v)=Av [[/math]]


We develop now more general theory for such linear maps. We will be interested in the question of inverting the linear maps [math]f:\mathbb R^N\to\mathbb R^N[/math]. And the point is that this is the same question as inverting the corresponding matrices [math]A\in M_N(\mathbb R)[/math], due to:

Theorem

A linear map [math]f:\mathbb R^N\to\mathbb R^N[/math], written as

[[math]] f(v)=Av [[/math]]
is invertible precisely when [math]A[/math] is invertible, and in this case we have [math]f^{-1}(v)=A^{-1}v[/math].


Show Proof

This comes indeed from the fact that, with the notation [math]f_A(v)=Av[/math], we have the formula [math]f_Af_B=f_{AB}[/math]. Thus, we are led to the conclusion in the statement.

In order to study invertibility questions, for matrices or linear maps, let us begin with some examples. In the simplest case, in 2 dimensions, the result is as follows:

Theorem

We have the following inversion formula, for the [math]2\times2[/math] matrices:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix}^{-1} =\frac{1}{ad-bc}\begin{pmatrix}d&-b\\ -c&a\end{pmatrix} [[/math]]
When [math]ad-bc=0[/math], the matrix is not invertible.


Show Proof

We have two assertions to be proved, the idea being as follows:


(1) As a first observation, when [math]ad-bc=0[/math] we must have, for some [math]\lambda\in\mathbb R[/math]:

[[math]] b=\lambda a\quad,\quad d=\lambda c [[/math]]


Thus our matrix must be of the following special type:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix}=\begin{pmatrix}a&\lambda a\\ a&\lambda c\end{pmatrix} [[/math]]


But in this case the columns are proportional, so the linear map associated to the matrix is not invertible, and so the matrix itself is not invertible either.


(2) When [math]ad-bc\neq 0[/math], let us look for an inversion formula of the following type:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix}^{-1} =\frac{1}{ad-bc}\begin{pmatrix}*&*\\ *&*\end{pmatrix} [[/math]]


We must therefore solve the following system of equations:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix} \begin{pmatrix}*&*\\ *&*\end{pmatrix}= \begin{pmatrix}ad-bc&0\\ 0&ad-bc\end{pmatrix} [[/math]]


But the solution to these equations is obvious, is as follows:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix} \begin{pmatrix}d&-b\\ -c&a\end{pmatrix}= \begin{pmatrix}ad-bc&0\\ 0&ad-bc\end{pmatrix} [[/math]]


Thus, we are led to the formula in the statement.

In order to deal now with the inversion problem in general, for the arbitrary matrices [math]A\in M_N(\mathbb R)[/math], we will use the same method as the one above, at [math]N=2[/math]. Let us write indeed our matrix as follows, with [math]v_1,\ldots,v_N\in\mathbb R^N[/math] being its column vectors:

[[math]] A=[v_1,\ldots,v_N] [[/math]]


We know from the above that, in order for the matrix [math]A[/math] to be invertible, the vectors [math]v_1,\ldots,v_N[/math] must be linearly independent. Thus, we are led into the question of understanding when a family of vectors [math]v_1,\ldots,v_N\in\mathbb R^N[/math] are linearly independent. In order to deal with this latter question, let us introduce the following notion:

Definition

Associated to any vectors [math]v_1,\ldots,v_N\in\mathbb R^N[/math] is the volume

[[math]] {\rm det}^+(v_1\ldots v_N)=vol \lt v_1,\ldots,v_N \gt [[/math]]
of the parallelepiped made by these vectors.

Here the volume is taken in the standard [math]N[/math]-dimensional sense. At [math]N=1[/math] this volume is a length, at [math]N=2[/math] this volume is an area, at [math]N=3[/math] this is the usual 3D volume, and so on. In general, the volume of a body [math]X\subset\mathbb R^N[/math] is by definition the number [math]vol(X)\in[0,\infty][/math] of copies of the unit cube [math]C\subset\mathbb R^N[/math] which are needed for filling [math]X[/math]. Now with this notion in hand, in relation with our inversion problem, we have the following statement:

Proposition

The quantity [math]{\rm det}^+[/math] that we constructed, regarded as a function of the corresponding square matrices, formed by column vectors,

[[math]] {\rm det}^+:M_N(\mathbb R)\to\mathbb R_+ [[/math]]
has the property that a matrix [math]A\in M_N(\mathbb R)[/math] is invertible precisely when [math]{\rm det}^+(A) \gt 0[/math].


Show Proof

This follows from the fact that a matrix [math]A\in M_N(\mathbb R)[/math] is invertible precisely when its column vectors [math]v_1,\ldots,v_N\in\mathbb R^N[/math] are linearly independent. But this latter condition is equivalent to the fact that we must have the following strict inequality:

[[math]] vol \lt v_1,\ldots,v_N \gt \gt 0 [[/math]]


Thus, we are led to the conclusion in the statement.

Summarizing, all this leads us into the explicit computation of [math]{\rm det}^+[/math]. As a first observation, in 1 dimension we obtain the absolute value of the real numbers:

[[math]] {\rm det}^+(a)=|a| [[/math]]


In 2 dimensions now, the computation is non-trivial, and we have the following result, making the link with our main inversion result so far, namely Theorem 9.7:

Theorem

In [math]2[/math] dimensions we have the following formula,

[[math]] {\rm det}^+\begin{pmatrix}a&b\\ c&d\end{pmatrix}=|ad-bc| [[/math]]
with [math]{\rm det}^+:M_2(\mathbb R)\to\mathbb R_+[/math] being the function constructed above.


Show Proof

We must show that the area of the parallelogram formed by [math]\binom{a}{c},\binom{b}{d}[/math] equals [math]|ad-bc|[/math]. We can assume [math]a,b,c,d \gt 0[/math] for simplifying, the proof in general being similar. Moreover, by switching if needed the vectors [math]\binom{a}{c},\binom{b}{d}[/math], we can assume that we have:

[[math]] \frac{a}{c} \gt \frac{b}{d} [[/math]]


According to these conventions, the picture of our parallelogram is as follows:

[[math]] \xymatrix@R=10pt@C=15pt{ &&&\\ c+d&&&&\bullet\\ d&&\bullet\ar@{-}[urr]&&\\ c&&&\bullet\ar@{-}[uur]&\\ &\bullet\ar@{-}[urr]\ar[rrrr]\ar[uuuu]\ar@{-}[uur]&&&&\\ &&\ b\ &\ a\ &a+b} [[/math]]


Now let us slide the upper side downwards left, until we reach the [math]Oy[/math] axis. Our parallelogram, which has not changed its area in this process, becomes:

[[math]] \xymatrix@R=2pt@C=16pt{ &&&\\ c+d&&&&\circ\\ c+x&&&\bullet\ar@{.}[ur]&\\ d&&\circ\ar@{-}[ur]&&&&&\\ x&\bullet\ar@{-}[ur]\ar[uuuu]&&&\\ c&&&\bullet\ar@{-}[uuu]\ar@{.}[uuuur]&\\ &&&&&&\\ &\bullet\ar@{-}[uurr]\ar@{-}[uuu]\ar[rrrr]\ar@{.}[uuuur]&&&&\\ &&\ b\ &\ a\ &a+b} [[/math]]


We can further modify this parallelogram, once again by not altering its area, by sliding the right side downwards, until we reach the [math]Ox[/math] axis:

[[math]] \xymatrix@R=10pt@C=15pt{ &&&\\ c+x&&&\circ&\\ x&\bullet\ar@{.}[urr]\ar[uu]\ar@{-}[rr]&&\bullet\ar@{.}[u]&\\ c&&&\circ\ar@{-}[u]&\\ &\bullet\ar@{.}[urr]\ar@{-}[uu]\ar@{-}[rr]&&\bullet\ar@{-}[u]\ar[rr]&&\\ &&\ b\ &\ a\ &a+b} [[/math]]


Let us compute now the area. Since our two sliding operations have not changed the area of the original parallelogram, this area is given by:

[[math]] A=ax [[/math]]


In order to compute the quantity [math]x[/math], observe that in the context of the first move, we have two similar triangles, according to the following picture:

[[math]] \xymatrix@R=5pt@C=15pt{ &&&&\\ c+d&&&&\bullet\\ &&&&&&\\ d&\circ\ar@{.}[r]\ar[uuu]&\bullet\ar@{.}[rr]\ar@{-}[uurr]&&\circ\ar@{-}[uu]\\ x&\bullet\ar@{-}[u]\ar@{-}[ur]&&&\\ &&&&\\ &\ar@{-}[uu]\ar[rrrr]&&&&\\ &&\ b\ &\ a\ &a+b} [[/math]]


Thus, we are led to the following equation for the number [math]x[/math]:

[[math]] \frac{d-x}{b}=\frac{c}{a} [[/math]]


By solving this equation, we obtain the following value for [math]x[/math]:

[[math]] x=d-\frac{bc}{a} [[/math]]


Thus the area of our parallelogram, or rather of the final rectangle obtained from it, which has the same area as the original parallelogram, is given by:

[[math]] ax=ad-bc [[/math]]


Thus, we are led to the conclusion in the statement.

All this is very nice, and obviously we have a beginning of theory here. However, when looking carefully, we can see that our theory has a weakness, because:


  • In 1 dimension the number [math]a[/math], which is the simplest function of [math]a[/math] itself, is certainly a better quantity than the number [math]|a|[/math].
  • In 2 dimensions the number [math]ad-bc[/math], which is linear in [math]a,b,c,d[/math], is certainly a better quantity than the number [math]|ad-bc|[/math].


So, let us upgrade now our theory, by constructing a better function, which takes signed values. In order to do this, we must come up with a way of splitting the systems of vectors [math]v_1,\ldots,v_N\in\mathbb R^N[/math] into two classes, call them positive and negative. And here, the answer is quite clear, because a bit of thinking leads to the following definition:

Definition

A system of vectors [math]v_1,\ldots,v_N\in\mathbb R^N[/math] is called:

  • Oriented, if one can continuously pass from the standard basis to it.
  • Unoriented, otherwise.

The associated sign is [math]+[/math] in the oriented case, and [math]-[/math] in the unoriented case.

As a first example, in 1 dimension the basis consists of the single vector [math]e=1[/math], which can be continuously deformed into any vector [math]a \gt 0[/math]. Thus, the sign is the usual one:

[[math]] sgn(a)= \begin{cases} +&{\rm if}\ a \gt 0\\ -&{\rm if}\ a \lt 0 \end{cases} [[/math]]


Thus, in connection with our original question, we are definitely on the good track, because when multiplying [math]|a|[/math] by this sign we obtain [math]a[/math] itself, as desired:

[[math]] a=sgn(a)|a| [[/math]]


In 2 dimensions now, the explicit formula of the sign is as follows:

Proposition

We have the following formula, valid for any [math]2[/math] vectors in [math]\mathbb R^2[/math],

[[math]] sgn\left[\binom{a}{c},\binom{b}{d}\right]=sgn(ad-bc) [[/math]]
with the sign function on the right being the usual one, in [math]1[/math] dimension.


Show Proof

According to our conventions, the sign of [math]\binom{a}{c},\binom{b}{d}[/math] is as follows:


(1) The sign is [math]+[/math] when these vectors come in this precise order with respect to the counterclockwise rotation in the plane, around 0.


(2) The sign is [math]-[/math] otherwise, meaning when these vectors come in this order with respect to the clockwise rotation in the plane, around 0.


If we assume now [math]a,b,c,d \gt 0[/math] for simplifying, we are left with comparing the angles having the numbers [math]c/a[/math] and [math]d/b[/math] as tangents, and we obtain in this way:

[[math]] sgn\left[\binom{a}{c},\binom{b}{d}\right]= \begin{cases} +&{\rm if}\ \frac{c}{a} \lt \frac{d}{b}\\ -&{\rm if}\ \frac{c}{a} \gt \frac{d}{b} \end{cases} [[/math]]


But this gives the formula in the statement. The proof in general is similar.

Once again, in connection with our original question, we are on the good track, because when multiplying [math]|ad-bc|[/math] by this sign we obtain [math]ad-bc[/math] itself, as desired:

[[math]] ad-bc=sgn(ad-bc)|ad-bc| [[/math]]


At the level of the general results now, we have:

Proposition

The orientation of a system of vectors changes as follows:

  • If we switch the sign of a vector, the associated sign switches.
  • If we permute two vectors, the associated sign switches as well.


Show Proof

Both these assertions are clear from the definition of the sign, because the two operations in question change the orientation of the system of vectors.

With the above notion in hand, we can now formulate:

Definition

The determinant of [math]v_1,\ldots,v_N\in\mathbb R^N[/math] is the signed volume

[[math]] \det(v_1\ldots v_N)=\pm vol \lt v_1,\ldots,v_N \gt [[/math]]
of the parallelepiped made by these vectors.

In other words, we are upgrading here Definition 9.8, by adding a sign to the quantity [math]{\rm det}^+[/math] constructed there, as to potentially reach to good additivity properties:

[[math]] \det(v_1\ldots v_N)=\pm {\rm det}^+(v_1\ldots v_N) [[/math]]


In relation with our original inversion problem for the square matrices, this upgrade does not change what we have so far, and we have the following statement:

Theorem

The quantity [math]\det[/math] that we constructed, regarded as a function of the corresponding square matrices, formed by column vectors,

[[math]] \det:M_N(\mathbb R)\to\mathbb R [[/math]]
has the property that a matrix [math]A\in M_N(\mathbb R)[/math] is invertible precisely when [math]\det(A)\neq 0[/math].


Show Proof

We know from the above that a matrix [math]A\in M_N(\mathbb R)[/math] is invertible precisely when [math]{\rm det}^+(A)=|\det A|[/math] is strictly positive, and this gives the result.

Let us try now to compute the determinant. In 1 dimension we have of course the formula [math]\det(a)=a[/math], because the absolute value fits, and so does the sign:

[[math]] \det(a) =sgn(a)\times|a| =a [[/math]]


In 2 dimensions now, we have the following result:

Theorem

In [math]2[/math] dimensions we have the following formula,

[[math]] \begin{vmatrix}a&b\\ c&d\end{vmatrix}=ad-bc [[/math]]
with [math]|\,.\,|=\det[/math] being the determinant function constructed above.


Show Proof

According to our definition, to the computation in Theorem 9.10, and to the sign formula from Proposition 9.12, the determinant of a [math]2\times2[/math] matrix is given by:

[[math]] \begin{eqnarray*} \det\begin{pmatrix}a&b\\ c&d\end{pmatrix} &=&sgn\left[\binom{a}{c},\binom{b}{d}\right]\times {\rm det}^+\begin{pmatrix}a&b\\ c&d\end{pmatrix}\\ &=&sgn\left[\binom{a}{c},\binom{b}{d}\right]\times|ad-bc|\\ &=&sgn(ad-bc)\times|ad-bc|\\ &=&ad-bc \end{eqnarray*} [[/math]]


Thus, we have obtained the formula in the statement.

9c. The determinant

In order to discuss now arbitrary dimensions, we will need a number of theoretical results. Here is a first series of formulae, coming straight from definitions:

Theorem

The determinant has the following properties:

  • When multiplying by scalars, the determinant gets multiplied as well:
    [[math]] \det(\lambda_1v_1,\ldots,\lambda_Nv_N)=\lambda_1\ldots\lambda_N\det(v_1,\ldots,v_N) [[/math]]
  • When permuting two columns, the determinant changes the sign:
    [[math]] \det(\ldots,u,\ldots,v,\ldots)=-\det(\ldots,v,\ldots,u,\ldots) [[/math]]
  • The determinant [math]\det(e_1,\ldots,e_N)[/math] of the standard basis of [math]\mathbb R^N[/math] is [math]1[/math].


Show Proof

All this is clear from definitions, as follows:


(1) This follows from definitions, and from Proposition 9.13 (1).


(2) This follows as well from definitions, and from Proposition 9.13 (2).


(3) This is clear from our definition of the determinant.

As an application of the above result, we have:

Theorem

The determinant of a diagonal matrix is given by:

[[math]] \begin{vmatrix} \lambda_1\\ &\ddots\\ &&\lambda_N\end{vmatrix}=\lambda_1\ldots\lambda_N [[/math]]
That is, we obtain the product of diagonal entries, or of eigenvalues.


Show Proof

The formula in the statement is clear by using Theorem 9.17, which gives:

[[math]] \begin{vmatrix} \lambda_1\\ &\ddots\\ &&\lambda_N\end{vmatrix} =\lambda_1\ldots\lambda_N \begin{vmatrix} 1\\ &\ddots\\ &&1\end{vmatrix} =\lambda_1\ldots\lambda_N [[/math]]


As for the last assertion, this is rather a remark.

In order to reach to a more advanced theory, let us adopt now the linear map point of view. In this setting, the definition of the determinant reformulates as follows:

Theorem

Given a linear map, written as [math]f(v)=Av[/math], its “inflation coefficient”, obtained as the signed volume of the image of the unit cube, is given by:

[[math]] I_f=\det A [[/math]]
More generally, [math]I_f[/math] is the inflation ratio of any parallelepiped in [math]\mathbb R^N[/math], via the transformation [math]f[/math]. In particular [math]f[/math] is invertible precisely when [math]\det A\neq0[/math].


Show Proof

The only non-trivial thing in all this is the fact that the inflation coefficient [math]I_f[/math], as defined above, is independent of the choice of the parallelepiped. But this is a generalization of the Thales theorem, which follows from the Thales theorem itself.

As a first application of the above linear map viewpoint, we have:

Theorem

We have the following formula, valid for any matrices [math]A,B[/math]:

[[math]] \det(AB)=\det A\cdot\det B [[/math]]
In particular, we have [math]\det(AB)=\det(BA)[/math].


Show Proof

The first formula follows from the formula [math]f_{AB}=f_Af_B[/math] for the associated linear maps. As for [math]\det(AB)=\det(BA)[/math], this is clear from the first formula.

Getting back now to explicit computations, we have the following key result:

Theorem

The determinant of a diagonalizable matrix,

[[math]] A\sim\begin{pmatrix} \lambda_1\\ &\ddots\\ &&\lambda_N\end{pmatrix} [[/math]]
is the product of its eigenvalues, [math]\det A=\lambda_1\ldots\lambda_N[/math].


Show Proof

We have not talked yet about diagonalization, and more on this in a moment, but the idea is that a matrix is diagonalizable when it can be written in the form [math]A=PDP^{-1}[/math], with [math]D=diag(\lambda_1,\ldots,\lambda_N)[/math]. Now by using Theorem 9.20, we obtain:

[[math]] \begin{eqnarray*} \det A &=&\det(PDP^{-1})\\ &=&\det(DP^{-1}P)\\ &=&\det D\\ &=&\lambda_1\ldots\lambda_N \end{eqnarray*} [[/math]]


Thus, we are led to the formula in the statement.

In general now, at the theoretical level, we have the following key result:

Theorem

The determinant has the additivity property

[[math]] \det(\ldots,u+v,\ldots) =\det(\ldots,u,\ldots) +\det(\ldots,v,\ldots) [[/math]]
valid for any choice of the vectors involved.


Show Proof

This follows by doing some elementary geometry, in the spirit of the computations in the proof of Theorem 9.10, as follows:


(1) We can either use the Thales theorem, and then compute the volumes of all the parallelepipeds involved, by using basic algebraic formulae.


(2) Or we can solve the problem in “puzzle” style, the idea being to cut the big parallelepiped, and then recover the small ones, after some manipulations.


(3) We can do as well something hybrid, consisting in deforming the parallelepipeds involved, without changing their volumes, and then cutting and gluing.

As a basic application of the above result, we have:

Theorem

We have the following results:

  • The determinant of a diagonal matrix is the product of diagonal entries.
  • The same is true for the upper triangular matrices.
  • The same is true for the lower triangular matrices.


Show Proof

All this can be deduced by using our various general formulae, as follows:


(1) This is something that we already know, from Theorem 9.21.


(2) This follows by using our various formulae, then (1), as follows:

[[math]] \begin{eqnarray*} \begin{vmatrix} \lambda_1&&&*\\ &\lambda_2\\ &&\ddots\\ 0&&&\lambda_N\end{vmatrix} &=&\begin{vmatrix} \lambda_1&0&&*\\ &\lambda_2\\ &&\ddots\\ 0&&&\lambda_N\end{vmatrix}\\ &&\vdots\\ &&\vdots\\ &=&\begin{vmatrix} \lambda_1&&&0\\ &\lambda_2\\ &&\ddots\\ 0&&&\lambda_N\end{vmatrix}\\ &=&\lambda_1\ldots\lambda_N \end{eqnarray*} [[/math]]


(3) This follows as well from our various formulae, then (1), by proceeding this time from right to left, from the last column towards the first column.

As an important theoretical result now, we have:

Theorem

The determinant of square matrices is the unique map

[[math]] \det:M_N(\mathbb R)\to\mathbb R [[/math]]
satisfying the conditions found above.


Show Proof

Any map [math]\det':M_N(\mathbb R)\to\mathbb R[/math] satisfying our conditions must indeed coincide with [math]\det[/math] on the upper triangular matrices, and then all the matrices.

Here is now another important theoretical result:

Theorem

The determinant is subject to the row expansion formula

[[math]] \begin{eqnarray*} \begin{vmatrix}a_{11}&\ldots&a_{1N}\\ \vdots&&\vdots\\ a_{N1}&\ldots&a_{NN}\end{vmatrix} &=&a_{11}\begin{vmatrix}a_{22}&\ldots&a_{2N}\\ \vdots&&\vdots\\ a_{N2}&\ldots&a_{NN}\end{vmatrix} -a_{12}\begin{vmatrix}a_{21}&a_{23}&\ldots&a_{2N}\\ \vdots&\vdots&&\vdots\\ a_{N1}&a_{N3}&\ldots&a_{NN}\end{vmatrix}\\ &&+\ldots\ldots +(-1)^{N+1}a_{1N}\begin{vmatrix}a_{21}&\ldots&a_{2,N-1}\\ \vdots&&\vdots\\ a_{N1}&\ldots&a_{N,N-1}\end{vmatrix} \end{eqnarray*} [[/math]]
and this method fully computes it, by recurrence.


Show Proof

This follows from the fact that the formula in the statement produces a certain function [math]\det:M_N(\mathbb R)\to\mathbb R[/math], which has the 4 properties in Theorem 9.24.

We can expand as well over the columns, as follows:

Theorem

The determinant is subject to the column expansion formula

[[math]] \begin{eqnarray*} \begin{vmatrix}a_{11}&\ldots&a_{1N}\\ \vdots&&\vdots\\ a_{N1}&\ldots&a_{NN}\end{vmatrix} &=&a_{11}\begin{vmatrix}a_{22}&\ldots&a_{2N}\\ \vdots&&\vdots\\ a_{N2}&\ldots&a_{NN}\end{vmatrix} -a_{21}\begin{vmatrix}a_{12}&\ldots&a_{1N}\\ a_{32}&\ldots&a_{3N}\\ \vdots&&\vdots\\ a_{N2}&\ldots&a_{NN}\end{vmatrix}\\ &&+\ldots\ldots +(-1)^{N+1}a_{N1}\begin{vmatrix}a_{12}&\ldots&a_{1N}\\ \vdots&&\vdots\\ a_{N-1,2}&\ldots&a_{N-1,N}\end{vmatrix} \end{eqnarray*} [[/math]]
and this method fully computes it, by recurrence.


Show Proof

This follows by using the same argument as for the rows.

As a first application of the above methods, we can now prove:

Theorem

The determinant of the [math]3\times3[/math] matrices is given by

[[math]] \begin{vmatrix}a&b&c\\ d&e&f\\ g&h&i\end{vmatrix}=aei+bfg+cdh-ceg-bdi-afh [[/math]]
which can be memorized by using Sarrus' triangle method, “triangles parallel to the diagonal, minus triangles parallel to the antidiagonal”.


Show Proof

Here is the computation, using the above results:

[[math]] \begin{eqnarray*} \begin{vmatrix}a&b&c\\ d&e&f\\ g&h&i\end{vmatrix} &=&a\begin{vmatrix}e&f\\h&i\end{vmatrix} -b\begin{vmatrix}d&f\\g&i\end{vmatrix} +c\begin{vmatrix}d&e\\g&h\end{vmatrix}\\ &=&a(ei-fh)-b(di-fg)+c(dh-eg)\\ &=&aei-afh-bdi+bfg+cdh-ceg\\ &=&aei+bfg+cdh-ceg-bdi-afh \end{eqnarray*} [[/math]]


Thus, we obtain the formula in the statement.

Let us discuss now the general formula of the determinant, at arbitrary values [math]N\in\mathbb N[/math] of the matrix size, generalizing the formulae that we have at [math]N=2,3[/math]. We will need:

Definition

A permutation of [math]\{1,\ldots,N\}[/math] is a bijection, as follows:

[[math]] \sigma:\{1,\ldots,N\}\to\{1,\ldots,N\} [[/math]]
The set of such permutations is denoted [math]S_N[/math].

There are many possible notations for the permutations, the simplest one consisting in writing the numbers [math]1,\ldots,N[/math], and below them, their permuted versions:

[[math]] \sigma=\begin{pmatrix} 1&2&3&4&5\\ 2&1&4&5&3 \end{pmatrix} [[/math]]


Another method, which is better for most purposes, and faster too, remember that time is money, is by denoting permutations as diagrams, going from top to bottom:

[[math]] \xymatrix@R=3mm@C=3.5mm{ &\ar@{-}[ddr]&\ar@{-}[ddl]&\ar@{-}[ddrr]&\ar@{-}[ddl]&\ar@{-}[ddl]\\ \sigma=\\ &&&&&} [[/math]]


There are many interesting things that can be said about permutations. In what concerns us, we will need the following key result:

Theorem

The permutations have a signature function

[[math]] \varepsilon:S_N\to\{\pm1\} [[/math]]
which can be defined in the following equivalent ways:

  • As [math](-1)^c[/math], where [math]c[/math] is the number of inversions.
  • As [math](-1)^t[/math], where [math]t[/math] is the number of transpositions.
  • As [math](-1)^o[/math], where [math]o[/math] is the number of odd cycles.
  • As [math](-1)^x[/math], where [math]x[/math] is the number of crossings.
  • As the sign of the corresponding permuted basis of [math]\mathbb R^N[/math].


Show Proof

{{{4}}}

Now back to linear algebra, we can formulate a key result, as follows:

Theorem

We have the following formula for the determinant,

[[math]] \det A=\sum_{\sigma\in S_N}\varepsilon(\sigma)A_{1\sigma(1)}\ldots A_{N\sigma(N)} [[/math]]
with the signature function being the one introduced above.


Show Proof

This follows by recurrence over [math]N\in\mathbb N[/math], as follows:


(1) When developing the determinant over the first column, we obtain a signed sum of [math]N[/math] determinants of size [math](N-1)\times(N-1)[/math]. But each of these determinants can be computed by developing over the first column too, and so on, and we are led to the conclusion that we have a formula as in the statement, with [math]\varepsilon(\sigma)\in\{-1,1\}[/math] being certain coefficients.


(2) But these latter coefficients [math]\varepsilon(\sigma)\in\{-1,1\}[/math] can only be the signatures of the corresponding permutations [math]\sigma\in S_N[/math], with this being something that can be viewed again by recurrence, with either of the definitions (1-5) in Theorem 9.29 for the signature.

The above result is something quite tricky, and in order to get familiar with it, there is nothing better than doing some computations. As a first, basic example, in 2 dimensions we recover the usual formula of the determinant, the details being as follows:

[[math]] \begin{eqnarray*} \begin{vmatrix}a&b\\ c&d\end{vmatrix} &=&\varepsilon(|\,|)\cdot ad+\varepsilon(\slash\hskip-2mm\backslash)\cdot cb\\ &=&1\cdot ad+(-1)\cdot cb\\ &=&ad-bc \end{eqnarray*} [[/math]]


In 3 dimensions now, we recover the Sarrus formula:

[[math]] \begin{vmatrix}a&b&c\\ d&e&f\\ g&h&i\end{vmatrix}=aei+bfg+cdh-ceg-bdi-afh [[/math]]


Let us record as well the following general formula, which was not obvious with our previous theory of the determinant, but which follows from Theorem 9.30:

[[math]] \det(A)=\det(A^t) [[/math]]


There are countless other applications of the formula in Theorem 9.30, and we will be back to this, on several occassions. But, most importantly, that formula allows us to deal now with the complex matrices too, by formulating the following statement:

Theorem

If we define the determinant of a complex matrix [math]A\in M_N(\mathbb C)[/math] to be

[[math]] \det A=\sum_{\sigma\in S_N}\varepsilon(\sigma)A_{1\sigma(1)}\ldots A_{N\sigma(N)} [[/math]]
then this determinant has the same properties as the determinant of the real matrices.


Show Proof

This follows by doing some sort of reverse engineering, with respect to what has been done in this section, and we reach to the conclusion that [math]\det[/math] has indeed all the good properties that we are familiar with. Except of course for the properties at the very beginning of this section, in relation with volumes, which don't extend well to [math]\mathbb C^N[/math].

Good news, this is the end of the general theory that we wanted to develop. We have now in our bag all the needed techniques for computing the determinant.

9d. Diagonalization

Let us discuss now the diagonalization question for linear maps and matrices. The basic diagonalization theory, formulated in terms of matrices, is as follows:

Proposition

A vector [math]v\in\mathbb C^N[/math] is called eigenvector of [math]A\in M_N(\mathbb C)[/math], with corresponding eigenvalue [math]\lambda[/math], when [math]A[/math] multiplies by [math]\lambda[/math] in the direction of [math]v[/math]:

[[math]] Av=\lambda v [[/math]]
In the case where [math]\mathbb C^N[/math] has a basis [math]v_1,\ldots,v_N[/math] formed by eigenvectors of [math]A[/math], with corresponding eigenvalues [math]\lambda_1,\ldots,\lambda_N[/math], in this new basis [math]A[/math] becomes diagonal, as follows:

[[math]] A\sim\begin{pmatrix}\lambda_1\\&\ddots\\&&\lambda_N\end{pmatrix} [[/math]]
Equivalently, if we denote by [math]D=diag(\lambda_1,\ldots,\lambda_N)[/math] the above diagonal matrix, and by [math]P=[v_1\ldots v_N][/math] the square matrix formed by the eigenvectors of [math]A[/math], we have:

[[math]] A=PDP^{-1} [[/math]]
In this case we say that the matrix [math]A[/math] is diagonalizable.


Show Proof

This is something which is clear, the idea being as follows:


(1) The first assertion is clear, because the matrix which multiplies each basis element [math]v_i[/math] by a number [math]\lambda_i[/math] is precisely the diagonal matrix [math]D=diag(\lambda_1,\ldots,\lambda_N)[/math].


(2) The second assertion follows from the first one, by changing the basis. We can prove this by a direct computation as well, because we have [math]Pe_i=v_i[/math], and so:

[[math]] PDP^{-1}v_i =PDe_i =P\lambda_ie_i =\lambda_iPe_i =\lambda_iv_i [[/math]]


Thus, the matrices [math]A[/math] and [math]PDP^{-1}[/math] coincide, as stated.

In order to study the diagonalization problem, the idea is that the eigenvectors can be grouped into linear spaces, called eigenspaces, as follows:

Theorem

Let [math]A\in M_N(\mathbb C)[/math], and for any eigenvalue [math]\lambda\in\mathbb C[/math] define the corresponding eigenspace as being the vector space formed by the corresponding eigenvectors:

[[math]] E_\lambda=\left\{v\in\mathbb C^N\Big|Av=\lambda v\right\} [[/math]]
These eigenspaces [math]E_\lambda[/math] are then in a direct sum position, in the sense that given vectors [math]v_1\in E_{\lambda_1},\ldots,v_k\in E_{\lambda_k}[/math] corresponding to different eigenvalues [math]\lambda_1,\ldots,\lambda_k[/math], we have:

[[math]] \sum_ic_iv_i=0\implies c_i=0 [[/math]]
In particular, we have [math]\sum_\lambda\dim(E_\lambda)\leq N[/math], with the sum being over all the eigenvalues, and our matrix is diagonalizable precisely when we have equality.


Show Proof

We prove the first assertion by recurrence on [math]k\in\mathbb N[/math]. Assume by contradiction that we have a formula as follows, with the scalars [math]c_1,\ldots,c_k[/math] being not all zero:

[[math]] c_1v_1+\ldots+c_kv_k=0 [[/math]]


By dividing by one of these scalars, we can assume that our formula is:

[[math]] v_k=c_1v_1+\ldots+c_{k-1}v_{k-1} [[/math]]


Now let us apply [math]A[/math] to this vector. On the left we obtain:

[[math]] Av_k =\lambda_kv_k =\lambda_kc_1v_1+\ldots+\lambda_kc_{k-1}v_{k-1} [[/math]]


On the right we obtain something different, as follows:

[[math]] \begin{eqnarray*} A(c_1v_1+\ldots+c_{k-1}v_{k-1}) &=&c_1Av_1+\ldots+c_{k-1}Av_{k-1}\\ &=&c_1\lambda_1v_1+\ldots+c_{k-1}\lambda_{k-1}v_{k-1} \end{eqnarray*} [[/math]]


We conclude from this that the following equality must hold:

[[math]] \lambda_kc_1v_1+\ldots+\lambda_kc_{k-1}v_{k-1}=c_1\lambda_1v_1+\ldots+c_{k-1}\lambda_{k-1}v_{k-1} [[/math]]


On the other hand, we know by recurrence that the vectors [math]v_1,\ldots,v_{k-1}[/math] must be linearly independent. Thus, the coefficients must be equal, at right and at left:

[[math]] \lambda_kc_1=c_1\lambda_1 [[/math]]

[[math]] \vdots [[/math]]

[[math]] \lambda_kc_{k-1}=c_{k-1}\lambda_{k-1} [[/math]]


Now since at least one of the numbers [math]c_i[/math] must be nonzero, from [math]\lambda_kc_i=c_i\lambda_i[/math] we obtain [math]\lambda_k=\lambda_i[/math], which is a contradiction. Thus our proof by recurrence of the first assertion is complete. As for the second assertion, this follows from the first one.

In order to reach now to more advanced results, we can use the characteristic polynomial, which appears via the following fundamental result:

Theorem

Given a matrix [math]A\in M_N(\mathbb C)[/math], consider its characteristic polynomial:

[[math]] P(x)=\det(A-x1_N) [[/math]]
The eigenvalues of [math]A[/math] are then the roots of [math]P[/math]. Also, we have the inequality

[[math]] \dim(E_\lambda)\leq m_\lambda [[/math]]
where [math]m_\lambda[/math] is the multiplicity of [math]\lambda[/math], as root of [math]P[/math].


Show Proof

The first assertion follows from the following computation, using the fact that a linear map is bijective when the determinant of the associated matrix is nonzero:

[[math]] \begin{eqnarray*} \exists v,Av=\lambda v &\iff&\exists v,(A-\lambda 1_N)v=0\\ &\iff&\det(A-\lambda 1_N)=0 \end{eqnarray*} [[/math]]


Regarding now the second assertion, given an eigenvalue [math]\lambda[/math] of our matrix [math]A[/math], consider the dimension [math]d_\lambda=\dim(E_\lambda)[/math] of the corresponding eigenspace. By changing the basis of [math]\mathbb C^N[/math], as for the eigenspace [math]E_\lambda[/math] to be spanned by the first [math]d_\lambda[/math] basis elements, our matrix becomes as follows, with [math]B[/math] being a certain smaller matrix:

[[math]] A\sim\begin{pmatrix}\lambda 1_{d_\lambda}&0\\0&B\end{pmatrix} [[/math]]


We conclude that the characteristic polynomial of [math]A[/math] is of the following form:

[[math]] P_A =P_{\lambda 1_{d_\lambda}}P_B =(\lambda-x)^{d_\lambda}P_B [[/math]]


Thus the multiplicity [math]m_\lambda[/math] of our eigenvalue [math]\lambda[/math], as a root of [math]P[/math], satisfies [math]m_\lambda\geq d_\lambda[/math], and this leads to the conclusion in the statement.

Now recall that we are over [math]\mathbb C[/math]. We obtain the following result:

Theorem

Given a matrix [math]A\in M_N(\mathbb C)[/math], consider its characteristic polynomial

[[math]] P(X)=\det(A-X1_N) [[/math]]

then factorize this polynomial, by computing the complex roots, with multiplicities,

[[math]] P(X)=(-1)^N(X-\lambda_1)^{n_1}\ldots(X-\lambda_k)^{n_k} [[/math]]
and finally compute the corresponding eigenspaces, for each eigenvalue found:

[[math]] E_i=\left\{v\in\mathbb C^N\Big|Av=\lambda_iv\right\} [[/math]]
The dimensions of these eigenspaces satisfy then the following inequalities,

[[math]] \dim(E_i)\leq n_i [[/math]]
and [math]A[/math] is diagonalizable precisely when we have equality for any [math]i[/math].


Show Proof

This follows by combining the above results. Indeed, by summing the inequalities [math]\dim(E_\lambda)\leq m_\lambda[/math] from Theorem 9.34, we obtain an inequality as follows:

[[math]] \sum_\lambda\dim(E_\lambda)\leq\sum_\lambda m_\lambda\leq N [[/math]]


On the other hand, we know from Theorem 9.33 that our matrix is diagonalizable when we have global equality. Thus, we are led to the conclusion in the statement.

As an illustration for all this, which is a must-know computation, we have:

Proposition

The rotation of angle [math]t\in\mathbb R[/math] in the plane diagonalizes as:

[[math]] \begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix} =\frac{1}{2}\begin{pmatrix}1&1\\i&-i\end{pmatrix} \begin{pmatrix}e^{-it}&0\\0&e^{it}\end{pmatrix} \begin{pmatrix}1&-i\\1&i\end{pmatrix} [[/math]]
Over the reals this is impossible, unless [math]t=0,\pi[/math], where the rotation is diagonal.


Show Proof

Observe first that, as indicated, unlike we are in the case [math]t=0,\pi[/math], where our rotation is [math]\pm1_2[/math], our rotation is a “true” rotation, having no eigenvectors in the plane. Fortunately the complex numbers come to the rescue, via the following computation:

[[math]] \begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix}\binom{1}{i} =\binom{\cos t-i\sin t}{i\cos t+\sin t} =e^{-it}\binom{1}{i} [[/math]]


We have as well a second complex eigenvector, coming from:

[[math]] \begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix}\binom{1}{-i} =\binom{\cos t+i\sin t}{-i\cos t+\sin t} =e^{it}\binom{1}{-i} [[/math]]


Thus, we are led to the conclusion in the statement.

At the level of basic examples of diagonalizable matrices, we first have the following result, which provides us with the “generic” examples:

Theorem

For a matrix [math]A\in M_N(\mathbb C)[/math] the following conditions are equivalent,

  • The eigenvalues are different, [math]\lambda_i\neq\lambda_j[/math],
  • The characteristic polynomial [math]P[/math] has simple roots,
  • The characteristic polynomial satisfies [math](P,P')=1[/math],
  • The resultant of [math]P,P'[/math] is nonzero, [math]R(P,P')\neq0[/math],
  • The discriminant of [math]P[/math] is nonzero, [math]\Delta(P)\neq0[/math],

and in this case, the matrix is diagonalizable.


Show Proof

The last assertion holds indeed, due to Theorem 9.35. As for the equivalences in the statement, these are all standard, by using the theory of [math]R,\Delta[/math] from chapter 5.

As already mentioned, one can prove that the matrices having distinct eigenvalues are “generic”, and so the above result basically captures the whole situation. We have in fact the following collection of density results, which are quite advanced:

Theorem

The following happen, inside [math]M_N(\mathbb C)[/math]:

  • The invertible matrices are dense.
  • The matrices having distinct eigenvalues are dense.
  • The diagonalizable matrices are dense.


Show Proof

These are quite advanced results, which can be proved as follows:


(1) This is clear, intuitively speaking, because the invertible matrices are given by the condition [math]\det A\neq 0[/math]. Thus, the set formed by these matrices appears as the complement of the hypersurface [math]\det A=0[/math], and so must be dense inside [math]M_N(\mathbb C)[/math], as claimed.


(2) Here we can use a similar argument, this time by saying that the set formed by the matrices having distinct eigenvalues appears as the complement of the hypersurface given by [math]\Delta(P_A)=0[/math], and so must be dense inside [math]M_N(\mathbb C)[/math], as claimed.


(3) This follows from (2), via the fact that the matrices having distinct eigenvalues are diagonalizable, that we know from Theorem 9.37. There are of course some other proofs as well, for instance by putting the matrix in Jordan form.

This was for the basic theory of the linear maps, that we will need in what follows. There are of course far more things that can be said, and we will be back to this gradually, when needed, and notably in chapter 12 below, when doing advanced calculus.

General references

Banica, Teo (2024). "Calculus and applications". arXiv:2401.00911 [math.CO].