Linear maps
9a. Linear maps
In the remainder of this book we study the functions of several variables. Things here are quite tricky, and we will do this slowly, over a whole 200 pages. Let us start with the linear functions, which are the simplest. We have here the following result:
The linear maps [math]f:\mathbb R^N\to\mathbb R^M[/math] are in correspondence with the matrices [math]A\in M_{M\times N}(\mathbb R)[/math], with the linear map associated to such a matrix being
The first assertion is clear, because a linear map [math]f:\mathbb R^N\to\mathbb R^M[/math] must send a vector [math]x\in\mathbb R^N[/math] to a certain vector [math]f(x)\in\mathbb R^M[/math], all whose components are linear combinations of the components of [math]x[/math]. Thus, we can write, for certain numbers [math]A_{ij}\in\mathbb R[/math]:
Now the parameters [math]A_{ij}\in\mathbb R[/math] can be regarded as being the entries of a rectangular matrix [math]A\in M_{M\times N}(\mathbb R)[/math], and with the usual convention for the rectangular matrix multiplication, the above formula is precisely the one in the statement, namely:
Regarding the second assertion, with [math]f(x)=Ax[/math] as above, if we denote by [math]e_1,\ldots,e_N[/math] the standard basis of [math]\mathbb R^N[/math], then we have the following formula:
But this gives [math] \lt f(e_j),e_i \gt =A_{ij}[/math], as desired. As for the last assertion, regarding the complex maps and matrices, the proof here is similar, and with the complex scalar products being by definition given, here and in what follows, by [math] \lt x,y \gt =\sum_ix_i\bar{y}_i[/math].
At the level of examples, let us first discuss the linear maps [math]f:\mathbb R^2\to\mathbb R^2[/math]. We have:
The rotation of angle [math]t\in\mathbb R[/math], and the symmetry with respect to the [math]Ox[/math] axis rotated by an angle [math]t/2\in\mathbb R[/math], are given by the matrices
The rotation being linear, it must correspond to a certain matrix:
We can guess this matrix, via its action on the basic coordinate vectors [math]\binom{1}{0}[/math] and [math]\binom{0}{1}[/math]. Indeed, a quick picture in the plane shows that we must have:
Guessing now the matrix is not complicated, because the first equality gives us the first column, and the second equality gives us the second column:
Thus, we can just put together these two vectors, and we obtain our matrix [math]R_t[/math]. As for the symmetry, the proof here is similar, again by computing [math]S_t\binom{1}{0}[/math] and [math]S_t\binom{0}{1}[/math].
Let us record as well a result regarding the projections, as follows:
The projection on the [math]Ox[/math] axis rotated by an angle [math]t/2\in\mathbb R[/math] is
A quick picture in the plane, using similarity of triangles, and the basic trigonometry formulae for the duplication of angles, show that we must have:
Similarly, another quick picture plus trigonometry show that we must have:
Now by putting together these two vectors, and we obtain our matrix.
Back to Theorem 9.1, our claim is that, no matter what we want to do with [math]f[/math] or [math]A[/math], we will run at some point into their adjoints [math]f^*[/math] and [math]A^*[/math], constructed as follows:
The adjoint linear map [math]f^*:\mathbb C^N\to\mathbb C^N[/math], which is given by
Given a linear map [math]f:\mathbb C^N\to\mathbb C^N[/math], fix [math]y\in\mathbb C^N[/math], and consider the linear form [math]\varphi(x)= \lt f(x),y \gt [/math]. This form must be as follows, for a certain vector [math]f^*(y)\in\mathbb C^N[/math]:
Thus, we have constructed a map [math]y\to f^*(y)[/math] as in the statement, which is obviously linear, and that we can call [math]f^*[/math]. Now by taking the vectors [math]x,y\in\mathbb C^N[/math] to be elements of the standard basis of [math]\mathbb C^N[/math], our defining formula for [math]f^*[/math] reads:
By reversing the scalar product on the right, this formula can be written as:
But this means that the matrix of [math]f^*[/math] is given by [math](A^*)_{ij}=\bar{A}_{ji}[/math], as desired.
Getting back to our claim, the adjoints [math]*[/math] are indeed ubiquitous, as shown by:
The following happen:
- [math]f(x)=Ux[/math] with [math]U\in M_N(\mathbb C)[/math] is an isometry precisely when [math]U^*=U^{-1}[/math].
- [math]f(x)=Px[/math] with [math]P\in M_N(\mathbb C)[/math] is a projection precisely when [math]P^2=P^*=P[/math].
Let us first recall that the lengths, or norms, of the vectors [math]x\in\mathbb C^N[/math] can be recovered from the knowledge of the scalar products, as follows:
Conversely, we can recover the scalar products out of norms, by using the following difficult to remember formula, called complex polarization identity:
Finally, we will use Theorem 9.4, and more specifically the following formula coming from there, valid for any matrix [math]A\in M_N(\mathbb C)[/math] and any two vectors [math]x,y\in\mathbb C^N[/math]:
(1) Given a matrix [math]U\in M_N(\mathbb C)[/math], we have indeed the following equivalences, with the first one coming from the polarization identity, and with the other ones being clear:
(2) Given a matrix [math]P\in M_N(\mathbb C)[/math], in order for [math]x\to Px[/math] to be an oblique projection, we must have [math]P^2=P[/math]. Now observe that this projection is orthogonal when:
The point now is that by conjugating the last formula, we obtain [math]P^*P=P[/math]. Thus we must have [math]P=P^*[/math], and this gives the result.
Summarizing, the linear operators come in pairs [math]T,T^*[/math], and the associated matrices come as well in pairs [math]A,A^*[/math]. We will keep this in mind, and come back to it later.
9b. Matrix inversion
We have seen so far that most of the interesting maps [math]f:\mathbb R^N\to\mathbb R^N[/math] that we know, such as the rotations, symmetries and projections, are linear, and can be written in the following form, with [math]A\in M_N(\mathbb R)[/math] being a square matrix:
We develop now more general theory for such linear maps. We will be interested in the question of inverting the linear maps [math]f:\mathbb R^N\to\mathbb R^N[/math]. And the point is that this is the same question as inverting the corresponding matrices [math]A\in M_N(\mathbb R)[/math], due to:
A linear map [math]f:\mathbb R^N\to\mathbb R^N[/math], written as
This comes indeed from the fact that, with the notation [math]f_A(v)=Av[/math], we have the formula [math]f_Af_B=f_{AB}[/math]. Thus, we are led to the conclusion in the statement.
In order to study invertibility questions, for matrices or linear maps, let us begin with some examples. In the simplest case, in 2 dimensions, the result is as follows:
We have the following inversion formula, for the [math]2\times2[/math] matrices:
We have two assertions to be proved, the idea being as follows:
(1) As a first observation, when [math]ad-bc=0[/math] we must have, for some [math]\lambda\in\mathbb R[/math]:
Thus our matrix must be of the following special type:
But in this case the columns are proportional, so the linear map associated to the matrix is not invertible, and so the matrix itself is not invertible either.
(2) When [math]ad-bc\neq 0[/math], let us look for an inversion formula of the following type:
We must therefore solve the following system of equations:
But the solution to these equations is obvious, is as follows:
Thus, we are led to the formula in the statement.
In order to deal now with the inversion problem in general, for the arbitrary matrices [math]A\in M_N(\mathbb R)[/math], we will use the same method as the one above, at [math]N=2[/math]. Let us write indeed our matrix as follows, with [math]v_1,\ldots,v_N\in\mathbb R^N[/math] being its column vectors:
We know from the above that, in order for the matrix [math]A[/math] to be invertible, the vectors [math]v_1,\ldots,v_N[/math] must be linearly independent. Thus, we are led into the question of understanding when a family of vectors [math]v_1,\ldots,v_N\in\mathbb R^N[/math] are linearly independent. In order to deal with this latter question, let us introduce the following notion:
Associated to any vectors [math]v_1,\ldots,v_N\in\mathbb R^N[/math] is the volume
Here the volume is taken in the standard [math]N[/math]-dimensional sense. At [math]N=1[/math] this volume is a length, at [math]N=2[/math] this volume is an area, at [math]N=3[/math] this is the usual 3D volume, and so on. In general, the volume of a body [math]X\subset\mathbb R^N[/math] is by definition the number [math]vol(X)\in[0,\infty][/math] of copies of the unit cube [math]C\subset\mathbb R^N[/math] which are needed for filling [math]X[/math]. Now with this notion in hand, in relation with our inversion problem, we have the following statement:
The quantity [math]{\rm det}^+[/math] that we constructed, regarded as a function of the corresponding square matrices, formed by column vectors,
This follows from the fact that a matrix [math]A\in M_N(\mathbb R)[/math] is invertible precisely when its column vectors [math]v_1,\ldots,v_N\in\mathbb R^N[/math] are linearly independent. But this latter condition is equivalent to the fact that we must have the following strict inequality:
Thus, we are led to the conclusion in the statement.
Summarizing, all this leads us into the explicit computation of [math]{\rm det}^+[/math]. As a first observation, in 1 dimension we obtain the absolute value of the real numbers:
In 2 dimensions now, the computation is non-trivial, and we have the following result, making the link with our main inversion result so far, namely Theorem 9.7:
In [math]2[/math] dimensions we have the following formula,
We must show that the area of the parallelogram formed by [math]\binom{a}{c},\binom{b}{d}[/math] equals [math]|ad-bc|[/math]. We can assume [math]a,b,c,d \gt 0[/math] for simplifying, the proof in general being similar. Moreover, by switching if needed the vectors [math]\binom{a}{c},\binom{b}{d}[/math], we can assume that we have:
According to these conventions, the picture of our parallelogram is as follows:
Now let us slide the upper side downwards left, until we reach the [math]Oy[/math] axis. Our parallelogram, which has not changed its area in this process, becomes:
We can further modify this parallelogram, once again by not altering its area, by sliding the right side downwards, until we reach the [math]Ox[/math] axis:
Let us compute now the area. Since our two sliding operations have not changed the area of the original parallelogram, this area is given by:
In order to compute the quantity [math]x[/math], observe that in the context of the first move, we have two similar triangles, according to the following picture:
Thus, we are led to the following equation for the number [math]x[/math]:
By solving this equation, we obtain the following value for [math]x[/math]:
Thus the area of our parallelogram, or rather of the final rectangle obtained from it, which has the same area as the original parallelogram, is given by:
Thus, we are led to the conclusion in the statement.
All this is very nice, and obviously we have a beginning of theory here. However, when looking carefully, we can see that our theory has a weakness, because:
- In 1 dimension the number [math]a[/math], which is the simplest function of [math]a[/math] itself, is certainly a better quantity than the number [math]|a|[/math].
- In 2 dimensions the number [math]ad-bc[/math], which is linear in [math]a,b,c,d[/math], is certainly a better quantity than the number [math]|ad-bc|[/math].
So, let us upgrade now our theory, by constructing a better function, which takes signed values. In order to do this, we must come up with a way of splitting the systems of vectors [math]v_1,\ldots,v_N\in\mathbb R^N[/math] into two classes, call them positive and negative. And here, the answer is quite clear, because a bit of thinking leads to the following definition:
A system of vectors [math]v_1,\ldots,v_N\in\mathbb R^N[/math] is called:
- Oriented, if one can continuously pass from the standard basis to it.
- Unoriented, otherwise.
The associated sign is [math]+[/math] in the oriented case, and [math]-[/math] in the unoriented case.
As a first example, in 1 dimension the basis consists of the single vector [math]e=1[/math], which can be continuously deformed into any vector [math]a \gt 0[/math]. Thus, the sign is the usual one:
Thus, in connection with our original question, we are definitely on the good track, because when multiplying [math]|a|[/math] by this sign we obtain [math]a[/math] itself, as desired:
In 2 dimensions now, the explicit formula of the sign is as follows:
We have the following formula, valid for any [math]2[/math] vectors in [math]\mathbb R^2[/math],
According to our conventions, the sign of [math]\binom{a}{c},\binom{b}{d}[/math] is as follows:
(1) The sign is [math]+[/math] when these vectors come in this precise order with respect to the counterclockwise rotation in the plane, around 0.
(2) The sign is [math]-[/math] otherwise, meaning when these vectors come in this order with respect to the clockwise rotation in the plane, around 0.
If we assume now [math]a,b,c,d \gt 0[/math] for simplifying, we are left with comparing the angles having the numbers [math]c/a[/math] and [math]d/b[/math] as tangents, and we obtain in this way:
But this gives the formula in the statement. The proof in general is similar.
Once again, in connection with our original question, we are on the good track, because when multiplying [math]|ad-bc|[/math] by this sign we obtain [math]ad-bc[/math] itself, as desired:
At the level of the general results now, we have:
The orientation of a system of vectors changes as follows:
- If we switch the sign of a vector, the associated sign switches.
- If we permute two vectors, the associated sign switches as well.
Both these assertions are clear from the definition of the sign, because the two operations in question change the orientation of the system of vectors.
With the above notion in hand, we can now formulate:
The determinant of [math]v_1,\ldots,v_N\in\mathbb R^N[/math] is the signed volume
In other words, we are upgrading here Definition 9.8, by adding a sign to the quantity [math]{\rm det}^+[/math] constructed there, as to potentially reach to good additivity properties:
In relation with our original inversion problem for the square matrices, this upgrade does not change what we have so far, and we have the following statement:
The quantity [math]\det[/math] that we constructed, regarded as a function of the corresponding square matrices, formed by column vectors,
We know from the above that a matrix [math]A\in M_N(\mathbb R)[/math] is invertible precisely when [math]{\rm det}^+(A)=|\det A|[/math] is strictly positive, and this gives the result.
Let us try now to compute the determinant. In 1 dimension we have of course the formula [math]\det(a)=a[/math], because the absolute value fits, and so does the sign:
In 2 dimensions now, we have the following result:
In [math]2[/math] dimensions we have the following formula,
According to our definition, to the computation in Theorem 9.10, and to the sign formula from Proposition 9.12, the determinant of a [math]2\times2[/math] matrix is given by:
Thus, we have obtained the formula in the statement.
9c. The determinant
In order to discuss now arbitrary dimensions, we will need a number of theoretical results. Here is a first series of formulae, coming straight from definitions:
The determinant has the following properties:
- When multiplying by scalars, the determinant gets multiplied as well:
[[math]] \det(\lambda_1v_1,\ldots,\lambda_Nv_N)=\lambda_1\ldots\lambda_N\det(v_1,\ldots,v_N) [[/math]]
- When permuting two columns, the determinant changes the sign:
[[math]] \det(\ldots,u,\ldots,v,\ldots)=-\det(\ldots,v,\ldots,u,\ldots) [[/math]]
- The determinant [math]\det(e_1,\ldots,e_N)[/math] of the standard basis of [math]\mathbb R^N[/math] is [math]1[/math].
All this is clear from definitions, as follows:
(1) This follows from definitions, and from Proposition 9.13 (1).
(2) This follows as well from definitions, and from Proposition 9.13 (2).
(3) This is clear from our definition of the determinant.
As an application of the above result, we have:
The determinant of a diagonal matrix is given by:
The formula in the statement is clear by using Theorem 9.17, which gives:
As for the last assertion, this is rather a remark.
In order to reach to a more advanced theory, let us adopt now the linear map point of view. In this setting, the definition of the determinant reformulates as follows:
Given a linear map, written as [math]f(v)=Av[/math], its “inflation coefficient”, obtained as the signed volume of the image of the unit cube, is given by:
The only non-trivial thing in all this is the fact that the inflation coefficient [math]I_f[/math], as defined above, is independent of the choice of the parallelepiped. But this is a generalization of the Thales theorem, which follows from the Thales theorem itself.
As a first application of the above linear map viewpoint, we have:
We have the following formula, valid for any matrices [math]A,B[/math]:
The first formula follows from the formula [math]f_{AB}=f_Af_B[/math] for the associated linear maps. As for [math]\det(AB)=\det(BA)[/math], this is clear from the first formula.
Getting back now to explicit computations, we have the following key result:
The determinant of a diagonalizable matrix,
We have not talked yet about diagonalization, and more on this in a moment, but the idea is that a matrix is diagonalizable when it can be written in the form [math]A=PDP^{-1}[/math], with [math]D=diag(\lambda_1,\ldots,\lambda_N)[/math]. Now by using Theorem 9.20, we obtain:
Thus, we are led to the formula in the statement.
In general now, at the theoretical level, we have the following key result:
The determinant has the additivity property
This follows by doing some elementary geometry, in the spirit of the computations in the proof of Theorem 9.10, as follows:
(1) We can either use the Thales theorem, and then compute the volumes of all the parallelepipeds involved, by using basic algebraic formulae.
(2) Or we can solve the problem in “puzzle” style, the idea being to cut the big parallelepiped, and then recover the small ones, after some manipulations.
(3) We can do as well something hybrid, consisting in deforming the parallelepipeds involved, without changing their volumes, and then cutting and gluing.
As a basic application of the above result, we have:
We have the following results:
- The determinant of a diagonal matrix is the product of diagonal entries.
- The same is true for the upper triangular matrices.
- The same is true for the lower triangular matrices.
All this can be deduced by using our various general formulae, as follows:
(1) This is something that we already know, from Theorem 9.21.
(2) This follows by using our various formulae, then (1), as follows:
(3) This follows as well from our various formulae, then (1), by proceeding this time from right to left, from the last column towards the first column.
As an important theoretical result now, we have:
The determinant of square matrices is the unique map
Any map [math]\det':M_N(\mathbb R)\to\mathbb R[/math] satisfying our conditions must indeed coincide with [math]\det[/math] on the upper triangular matrices, and then all the matrices.
Here is now another important theoretical result:
The determinant is subject to the row expansion formula
This follows from the fact that the formula in the statement produces a certain function [math]\det:M_N(\mathbb R)\to\mathbb R[/math], which has the 4 properties in Theorem 9.24.
We can expand as well over the columns, as follows:
The determinant is subject to the column expansion formula
This follows by using the same argument as for the rows.
As a first application of the above methods, we can now prove:
The determinant of the [math]3\times3[/math] matrices is given by
Here is the computation, using the above results:
Thus, we obtain the formula in the statement.
Let us discuss now the general formula of the determinant, at arbitrary values [math]N\in\mathbb N[/math] of the matrix size, generalizing the formulae that we have at [math]N=2,3[/math]. We will need:
A permutation of [math]\{1,\ldots,N\}[/math] is a bijection, as follows:
There are many possible notations for the permutations, the simplest one consisting in writing the numbers [math]1,\ldots,N[/math], and below them, their permuted versions:
Another method, which is better for most purposes, and faster too, remember that time is money, is by denoting permutations as diagrams, going from top to bottom:
There are many interesting things that can be said about permutations. In what concerns us, we will need the following key result:
The permutations have a signature function
- As [math](-1)^c[/math], where [math]c[/math] is the number of inversions.
- As [math](-1)^t[/math], where [math]t[/math] is the number of transpositions.
- As [math](-1)^o[/math], where [math]o[/math] is the number of odd cycles.
- As [math](-1)^x[/math], where [math]x[/math] is the number of crossings.
- As the sign of the corresponding permuted basis of [math]\mathbb R^N[/math].
{{{4}}}
Now back to linear algebra, we can formulate a key result, as follows:
We have the following formula for the determinant,
This follows by recurrence over [math]N\in\mathbb N[/math], as follows:
(1) When developing the determinant over the first column, we obtain a signed sum of [math]N[/math] determinants of size [math](N-1)\times(N-1)[/math]. But each of these determinants can be computed by developing over the first column too, and so on, and we are led to the conclusion that we have a formula as in the statement, with [math]\varepsilon(\sigma)\in\{-1,1\}[/math] being certain coefficients.
(2) But these latter coefficients [math]\varepsilon(\sigma)\in\{-1,1\}[/math] can only be the signatures of the corresponding permutations [math]\sigma\in S_N[/math], with this being something that can be viewed again by recurrence, with either of the definitions (1-5) in Theorem 9.29 for the signature.
The above result is something quite tricky, and in order to get familiar with it, there is nothing better than doing some computations. As a first, basic example, in 2 dimensions we recover the usual formula of the determinant, the details being as follows:
In 3 dimensions now, we recover the Sarrus formula:
Let us record as well the following general formula, which was not obvious with our previous theory of the determinant, but which follows from Theorem 9.30:
There are countless other applications of the formula in Theorem 9.30, and we will be back to this, on several occassions. But, most importantly, that formula allows us to deal now with the complex matrices too, by formulating the following statement:
If we define the determinant of a complex matrix [math]A\in M_N(\mathbb C)[/math] to be
This follows by doing some sort of reverse engineering, with respect to what has been done in this section, and we reach to the conclusion that [math]\det[/math] has indeed all the good properties that we are familiar with. Except of course for the properties at the very beginning of this section, in relation with volumes, which don't extend well to [math]\mathbb C^N[/math].
Good news, this is the end of the general theory that we wanted to develop. We have now in our bag all the needed techniques for computing the determinant.
9d. Diagonalization
Let us discuss now the diagonalization question for linear maps and matrices. The basic diagonalization theory, formulated in terms of matrices, is as follows:
A vector [math]v\in\mathbb C^N[/math] is called eigenvector of [math]A\in M_N(\mathbb C)[/math], with corresponding eigenvalue [math]\lambda[/math], when [math]A[/math] multiplies by [math]\lambda[/math] in the direction of [math]v[/math]:
This is something which is clear, the idea being as follows:
(1) The first assertion is clear, because the matrix which multiplies each basis element [math]v_i[/math] by a number [math]\lambda_i[/math] is precisely the diagonal matrix [math]D=diag(\lambda_1,\ldots,\lambda_N)[/math].
(2) The second assertion follows from the first one, by changing the basis. We can prove this by a direct computation as well, because we have [math]Pe_i=v_i[/math], and so:
Thus, the matrices [math]A[/math] and [math]PDP^{-1}[/math] coincide, as stated.
In order to study the diagonalization problem, the idea is that the eigenvectors can be grouped into linear spaces, called eigenspaces, as follows:
Let [math]A\in M_N(\mathbb C)[/math], and for any eigenvalue [math]\lambda\in\mathbb C[/math] define the corresponding eigenspace as being the vector space formed by the corresponding eigenvectors:
We prove the first assertion by recurrence on [math]k\in\mathbb N[/math]. Assume by contradiction that we have a formula as follows, with the scalars [math]c_1,\ldots,c_k[/math] being not all zero:
By dividing by one of these scalars, we can assume that our formula is:
Now let us apply [math]A[/math] to this vector. On the left we obtain:
On the right we obtain something different, as follows:
We conclude from this that the following equality must hold:
On the other hand, we know by recurrence that the vectors [math]v_1,\ldots,v_{k-1}[/math] must be linearly independent. Thus, the coefficients must be equal, at right and at left:
Now since at least one of the numbers [math]c_i[/math] must be nonzero, from [math]\lambda_kc_i=c_i\lambda_i[/math] we obtain [math]\lambda_k=\lambda_i[/math], which is a contradiction. Thus our proof by recurrence of the first assertion is complete. As for the second assertion, this follows from the first one.
In order to reach now to more advanced results, we can use the characteristic polynomial, which appears via the following fundamental result:
Given a matrix [math]A\in M_N(\mathbb C)[/math], consider its characteristic polynomial:
The first assertion follows from the following computation, using the fact that a linear map is bijective when the determinant of the associated matrix is nonzero:
Regarding now the second assertion, given an eigenvalue [math]\lambda[/math] of our matrix [math]A[/math], consider the dimension [math]d_\lambda=\dim(E_\lambda)[/math] of the corresponding eigenspace. By changing the basis of [math]\mathbb C^N[/math], as for the eigenspace [math]E_\lambda[/math] to be spanned by the first [math]d_\lambda[/math] basis elements, our matrix becomes as follows, with [math]B[/math] being a certain smaller matrix:
We conclude that the characteristic polynomial of [math]A[/math] is of the following form:
Thus the multiplicity [math]m_\lambda[/math] of our eigenvalue [math]\lambda[/math], as a root of [math]P[/math], satisfies [math]m_\lambda\geq d_\lambda[/math], and this leads to the conclusion in the statement.
Now recall that we are over [math]\mathbb C[/math]. We obtain the following result:
Given a matrix [math]A\in M_N(\mathbb C)[/math], consider its characteristic polynomial
then factorize this polynomial, by computing the complex roots, with multiplicities,
This follows by combining the above results. Indeed, by summing the inequalities [math]\dim(E_\lambda)\leq m_\lambda[/math] from Theorem 9.34, we obtain an inequality as follows:
On the other hand, we know from Theorem 9.33 that our matrix is diagonalizable when we have global equality. Thus, we are led to the conclusion in the statement.
As an illustration for all this, which is a must-know computation, we have:
The rotation of angle [math]t\in\mathbb R[/math] in the plane diagonalizes as:
Observe first that, as indicated, unlike we are in the case [math]t=0,\pi[/math], where our rotation is [math]\pm1_2[/math], our rotation is a “true” rotation, having no eigenvectors in the plane. Fortunately the complex numbers come to the rescue, via the following computation:
We have as well a second complex eigenvector, coming from:
Thus, we are led to the conclusion in the statement.
At the level of basic examples of diagonalizable matrices, we first have the following result, which provides us with the “generic” examples:
For a matrix [math]A\in M_N(\mathbb C)[/math] the following conditions are equivalent,
- The eigenvalues are different, [math]\lambda_i\neq\lambda_j[/math],
- The characteristic polynomial [math]P[/math] has simple roots,
- The characteristic polynomial satisfies [math](P,P')=1[/math],
- The resultant of [math]P,P'[/math] is nonzero, [math]R(P,P')\neq0[/math],
- The discriminant of [math]P[/math] is nonzero, [math]\Delta(P)\neq0[/math],
and in this case, the matrix is diagonalizable.
The last assertion holds indeed, due to Theorem 9.35. As for the equivalences in the statement, these are all standard, by using the theory of [math]R,\Delta[/math] from chapter 5.
As already mentioned, one can prove that the matrices having distinct eigenvalues are “generic”, and so the above result basically captures the whole situation. We have in fact the following collection of density results, which are quite advanced:
The following happen, inside [math]M_N(\mathbb C)[/math]:
- The invertible matrices are dense.
- The matrices having distinct eigenvalues are dense.
- The diagonalizable matrices are dense.
These are quite advanced results, which can be proved as follows:
(1) This is clear, intuitively speaking, because the invertible matrices are given by the condition [math]\det A\neq 0[/math]. Thus, the set formed by these matrices appears as the complement of the hypersurface [math]\det A=0[/math], and so must be dense inside [math]M_N(\mathbb C)[/math], as claimed.
(2) Here we can use a similar argument, this time by saying that the set formed by the matrices having distinct eigenvalues appears as the complement of the hypersurface given by [math]\Delta(P_A)=0[/math], and so must be dense inside [math]M_N(\mathbb C)[/math], as claimed.
(3) This follows from (2), via the fact that the matrices having distinct eigenvalues are diagonalizable, that we know from Theorem 9.37. There are of course some other proofs as well, for instance by putting the matrix in Jordan form.
This was for the basic theory of the linear maps, that we will need in what follows. There are of course far more things that can be said, and we will be back to this gradually, when needed, and notably in chapter 12 below, when doing advanced calculus.
General references
Banica, Teo (2024). "Calculus and applications". arXiv:2401.00911 [math.CO].