The determinant

2a. Matrix inversion

We have seen in the previous chapter that most of the interesting maps [math]f:\mathbb R^N\to\mathbb R^N[/math] that we know, such as the rotations, symmetries and projections, are linear, and can be written in the following form, with [math]A\in M_N(\mathbb R)[/math] being a square matrix:

[[math]] f(v)=Av [[/math]]

In this chapter we develop more general theory for such linear maps. We are mostly motivated by the following fundamental result, which has countless concrete applications, and which is actually at the origin of the whole linear algebra theory:

Theorem

Any linear system of equations

[[math]] \begin{cases} a_{11}x_1+a_{12}x_2+\ldots+a_{1N}x_N\!\!\!&=\ v_1\\ a_{21}x_1+a_{22}x_2+\ldots+a_{2N}x_N\!\!\!&=\ v_2\\ \ \ \vdots\\ a_{N1}x_1+a_{N2}x_2+\ldots+a_{NN}x_N\!\!\!&=\ v_N \end{cases} [[/math]]

can be written in matrix form, as follows,

[[math]] Ax=v [[/math]]

and when [math]A[/math] is invertible, its solution is given by [math]x=A^{-1}v[/math].

Show Proof

With linear algebra conventions, our system reads:

[[math]] \begin{pmatrix} a_{11}&a_{12}&\ldots&a_{1N}\\ a_{21}&a_{22}&\ldots&a_{2N}\\ \vdots&&&\vdots\\ a_{N1}&a_{N2}&\ldots&a_{NN} \end{pmatrix} \begin{pmatrix} x_1\\ x_2\\ \vdots\\ x_N \end{pmatrix} =\begin{pmatrix} v_1\\ v_2\\ \vdots\\ v_N \end{pmatrix} [[/math]]

Thus, we are led to the conclusions in the statement.

■

In practice, we are led to the question of inverting the matrices [math]A\in M_N(\mathbb R)[/math]. And this is the same question as inverting the linear maps [math]f:\mathbb R^N\to\mathbb R^N[/math], due to:

Theorem

A linear map [math]f:\mathbb R^N\to\mathbb R^N[/math], written as

[[math]] f(v)=Av [[/math]]

is invertible precisely when [math]A[/math] is invertible, and in this case we have [math]f^{-1}(v)=A^{-1}v[/math].

Show Proof

This is something that we basically know, coming from the fact that, with the notation [math]f_A(v)=Av[/math], we have the following formula:

[[math]] f_Af_B=f_{AB} [[/math]]

Thus, we are led to the conclusion in the statement.

■

In order to study invertibility questions, for matrices or linear maps, let us begin with some examples. In the simplest case, in 2 dimensions, the result is as follows:

Theorem

We have the following inversion formula, for the [math]2\times2[/math] matrices:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix}^{-1} =\frac{1}{ad-bc}\begin{pmatrix}d&-b\\ -c&a\end{pmatrix} [[/math]]

When [math]ad-bc=0[/math], the matrix is not invertible.

Show Proof

We have two assertions to be proved, the idea being as follows:

(1) As a first observation, when [math]ad-bc=0[/math] we must have, for some [math]\lambda\in\mathbb R[/math]:

[[math]] b=\lambda a\quad,\quad d=\lambda c [[/math]]

Thus our matrix must be of the following special type:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix}=\begin{pmatrix}a&\lambda a\\ a&\lambda c\end{pmatrix} [[/math]]

But in this case the columns are proportional, and so the linear map associated to the matrix is not invertible, and so the matrix itself is not invertible either.

(2) When [math]ad-bc\neq 0[/math], let us look for an inversion formula of the following type:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix}^{-1} =\frac{1}{ad-bc}\begin{pmatrix}*&*\\ *&*\end{pmatrix} [[/math]]

We must therefore solve the following equations:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix} \begin{pmatrix}*&*\\ *&*\end{pmatrix}= \begin{pmatrix}ad-bc&0\\ 0&ad-bc\end{pmatrix} [[/math]]

The obvious solution here is as follows:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix} \begin{pmatrix}d&-b\\ -c&a\end{pmatrix}= \begin{pmatrix}ad-bc&0\\ 0&ad-bc\end{pmatrix} [[/math]]

Thus, we are led to the formula in the statement.

■

In order to deal now with the inversion problem in general, for the arbitrary matrices [math]A\in M_N(\mathbb R)[/math], we will use the same method as the one above, at [math]N=2[/math]. Let us write indeed our matrix as follows, with [math]v_1,\ldots,v_N\in\mathbb R^N[/math] being its column vectors:

[[math]] A=[v_1,\ldots,v_N] [[/math]]

We know from the general results from chapter 1 that, in order for [math]A[/math] to be invertible, the vectors [math]v_1,\ldots,v_N[/math] must be linearly independent. Thus, following the observations (1) from the above proof of Theorem 2.3, we are led into the question of understanding when a family of vectors [math]v_1,\ldots,v_N\in\mathbb R^N[/math] are linearly independent.

In order to deal with this latter question, let us introduce the following notion:

Definition

Associated to any vectors [math]v_1,\ldots,v_N\in\mathbb R^N[/math] is the volume

[[math]] {\rm det}^+(v_1\ldots v_N)=vol \lt v_1,\ldots,v_N \gt [[/math]]

of the parallelepiped made by these vectors.

Here the volume is taken in the standard [math]N[/math]-dimensional sense. At [math]N=1[/math] this volume is a length, at [math]N=2[/math] this volume is an area, at [math]N=3[/math] this is the usual 3D volume, and so on. In general, the volume of a body [math]X\subset\mathbb R^N[/math] is by definition the number [math]vol(X)\in[0,\infty][/math] of copies of the unit cube [math]C\subset\mathbb R^N[/math] which are needed for filling [math]X[/math], when allowing this unit cube to be divided into smaller cubes, for the needs of the filling operation.

In order to compute this volume we can use various geometric techniques, and we will see soon that, in what regards the case that we are interested in, namely that of the parallelepipeds [math]P\subset\mathbb R^N[/math], we can basically compute here everything, just by using very basic geometric techniques, essentially based on the Thales theorem.

In relation with our inversion problem, we have the following statement:

Theorem

The quantity [math]{\rm det}^+[/math] that we constructed, regarded as a function of the corresponding square matrices, formed by column vectors,

[[math]] {\rm det}^+:M_N(\mathbb R)\to\mathbb R_+ [[/math]]

has the property that a matrix [math]A\in M_N(\mathbb R)[/math] is invertible precisely when [math]{\rm det}^+(A) \gt 0[/math].

Show Proof

This follows from Theorem 2.2, and from the general results from chapter 1, which tell us that a matrix [math]A\in M_N(\mathbb R)[/math] is invertible precisely when its column vectors [math]v_1,\ldots,v_N\in\mathbb R^N[/math] are linearly independent. But this latter condition is equivalent to the fact that we must have the following strict inequality:

[[math]] vol \lt v_1,\ldots,v_N \gt \gt 0 [[/math]]

Thus, we are led to the conclusion in the statement.

■

Summarizing, all this leads us into the explicit computation of [math]{\rm det}^+[/math]. As a first observation, in 1 dimension we obtain the absolute value of the real numbers:

[[math]] {\rm det}^+(a)=|a| [[/math]]

In 2 dimensions now, the computation is non-trivial, and we have the following result, making the link with our main result so far, namely Theorem 2.3:

Theorem

In [math]2[/math] dimensions we have the following formula,

[[math]] {\rm det}^+\begin{pmatrix}a&b\\ c&d\end{pmatrix}=|ad-bc| [[/math]]

with [math]{\rm det}^+:M_2(\mathbb R)\to\mathbb R_+[/math] being the function constructed above.

Show Proof

We must show that the area of the parallelogram formed by [math]\binom{a}{c},\binom{b}{d}[/math] equals [math]|ad-bc|[/math]. We can assume [math]a,b,c,d \gt 0[/math] for simplifying, the proof in general being similar. Moreover, by switching if needed the vectors [math]\binom{a}{c},\binom{b}{d}[/math], we can assume that we have:

[[math]] \frac{a}{c} \gt \frac{b}{d} [[/math]]

According to these conventions, the picture of our parallelogram is as follows:

[[math]] \xymatrix@R=10pt@C=15pt{ &&&\\ c+d&&&&\bullet\\ d&&\bullet\ar@{-}[urr]&&\\ c&&&\bullet\ar@{-}[uur]&\\ &\bullet\ar@{-}[urr]\ar[rrrr]\ar[uuuu]\ar@{-}[uur]&&&&\\ &&\ b\ &\ a\ &a+b} [[/math]]

Now let us slide the upper side downwards left, until we reach the [math]Oy[/math] axis. Our parallelogram, which has not changed its area in this process, becomes:

[[math]] \xymatrix@R=2pt@C=16pt{ &&&\\ c+d&&&&\circ\\ c+x&&&\bullet\ar@{.}[ur]&\\ d&&\circ\ar@{-}[ur]&&&&&\\ x&\bullet\ar@{-}[ur]\ar[uuuu]&&&\\ c&&&\bullet\ar@{-}[uuu]\ar@{.}[uuuur]&\\ &&&&&&\\ &\bullet\ar@{-}[uurr]\ar@{-}[uuu]\ar[rrrr]\ar@{.}[uuuur]&&&&\\ &&\ b\ &\ a\ &a+b} [[/math]]

We can further modify this parallelogram, once again by not altering its area, by sliding the right side downwards, until we reach the [math]Ox[/math] axis:

[[math]] \xymatrix@R=10pt@C=15pt{ &&&\\ c+x&&&\circ&\\ x&\bullet\ar@{.}[urr]\ar[uu]\ar@{-}[rr]&&\bullet\ar@{.}[u]&\\ c&&&\circ\ar@{-}[u]&\\ &\bullet\ar@{.}[urr]\ar@{-}[uu]\ar@{-}[rr]&&\bullet\ar@{-}[u]\ar[rr]&&\\ &&\ b\ &\ a\ &a+b} [[/math]]

Let us compute now the area. Since our two sliding operations have not changed the area of the original parallelogram, this area is given by:

[[math]] A=ax [[/math]]

In order to compute the quantity [math]x[/math], observe that in the context of the first move, we have two similar triangles, according to the following picture:

[[math]] \xymatrix@R=5pt@C=15pt{ &&&&\\ c+d&&&&\bullet\\ &&&&&&\\ d&\circ\ar@{.}[r]\ar[uuu]&\bullet\ar@{.}[rr]\ar@{-}[uurr]&&\circ\ar@{-}[uu]\\ x&\bullet\ar@{-}[u]\ar@{-}[ur]&&&\\ &&&&\\ &\ar@{-}[uu]\ar[rrrr]&&&&\\ &&\ b\ &\ a\ &a+b} [[/math]]

Thus, we are led to the following equation for the number [math]x[/math]:

[[math]] \frac{d-x}{b}=\frac{c}{a} [[/math]]

By solving this equation, we obtain the following value for [math]x[/math]:

[[math]] x=d-\frac{bc}{a} [[/math]]

Thus the area of our parallelogram, or rather of the final rectangle obtained from it, which has the same area as the original parallelogram, is given by:

[[math]] ax=ad-bc [[/math]]

Thus, we are led to the conclusion in the statement.

■

2b. The determinant

All the above is very nice, and we obviously have a beginning of theory here. However, when looking carefully, we can see that our theory has a weakness, because:

In 1 dimension the number [math]a[/math], which is the simplest function of [math]a[/math] itself, is certainly a better quantity than the number [math]|a|[/math].
In 2 dimensions the number [math]ad-bc[/math], which is linear in [math]a,b,c,d[/math], is certainly a better quantity than the number [math]|ad-bc|[/math].

So, let us upgrade now our theory, by constructing a better function, which does the same job, namely checking if the vectors are proportional, of the following type:

[[math]] \det:M_N(\mathbb R)\to\mathbb R\quad,\quad \det=\pm {\rm det}^+ [[/math]]

That is, we would like to have a clever, signed version of [math]\det^+[/math], satisfying:

[[math]] \det(a)=a\quad,\quad \det\begin{pmatrix}a&b\\ c&d\end{pmatrix}=ad-bc [[/math]]

In order to do this, we must come up with a way of splitting the systems of vectors [math]v_1,\ldots,v_N\in\mathbb R^N[/math] into two classes, call them positive and negative. And here, the answer is quite clear, because a bit of thinking leads to the following definition:

Definition

A system of vectors [math]v_1,\ldots,v_N\in\mathbb R^N[/math] is called:

Oriented, if one can continuously pass from the standard basis to it.
Unoriented, otherwise.

The associated sign is [math]+[/math] in the oriented case, and [math]-[/math] in the unoriented case.

As a first example, in 1 dimension the basis consists of the single vector [math]e=1[/math], which can be continuously deformed into any vector [math]a \gt 0[/math]. Thus, the sign is the usual one:

[[math]] sgn(a)= \begin{cases} +&{\rm if}\ a \gt 0\\ -&{\rm if}\ a \lt 0 \end{cases} [[/math]]

Thus, in connection with our original question, we are definitely on the good track, because when multiplying [math]|a|[/math] by this sign we obtain [math]a[/math] itself, as desired:

[[math]] a=sgn(a)|a| [[/math]]

In 2 dimensions now, the explicit formula of the sign is as follows:

Proposition

We have the following formula, valid for any [math]2[/math] vectors in [math]\mathbb R^2[/math],

[[math]] sgn\left[\binom{a}{c},\binom{b}{d}\right]=sgn(ad-bc) [[/math]]

with the sign function on the right being the usual one, in [math]1[/math] dimension.

Show Proof

According to our conventions, the sign of [math]\binom{a}{c},\binom{b}{d}[/math] is as follows:

(1) The sign is [math]+[/math] when these vectors come in this order with respect to the counterclockwise rotation in the plane, around 0.

(2) The sign is [math]-[/math] otherwise, meaning when these vectors come in this order with respect to the clockwise rotation in the plane, around 0.

If we assume now [math]a,b,c,d \gt 0[/math] for simplifying, we are left with comparing the angles having the numbers [math]c/a[/math] and [math]d/b[/math] as tangents, and we obtain in this way:

[[math]] sgn\left[\binom{a}{c},\binom{b}{d}\right]= \begin{cases} +&{\rm if}\ \frac{c}{a} \lt \frac{d}{b}\\ -&{\rm if}\ \frac{c}{a} \gt \frac{d}{b} \end{cases} [[/math]]

But this gives the formula in the statement. The proof in general is similar.

■

Once again, in connection with our original question, we are on the good track, because when multiplying [math]|ad-bc|[/math] by this sign we obtain [math]ad-bc[/math] itself, as desired:

[[math]] ad-bc=sgn(ad-bc)|ad-bc| [[/math]]

Let us look as well into the case [math]N=3[/math]. Things here are quite complicated, and we will discuss this later on. However, we have the following basic result:

Proposition

Consider the standard basis of [math]\mathbb R^3[/math], namely:

[[math]] e_1=\begin{pmatrix}1\\0\\0\end{pmatrix} \qquad,\qquad e_2=\begin{pmatrix}0\\1\\0\end{pmatrix} \qquad,\qquad e_3=\begin{pmatrix}0\\0\\1\end{pmatrix} [[/math]]

We have then the following sign computations:

[math]sgn(e_1,e_2,e_3)=+[/math].
[math]sgn(e_1,e_3,e_2)=-[/math].
[math]sgn(e_2,e_1,e_3)=-[/math].
[math]sgn(e_2,e_3,e_1)=+[/math].
[math]sgn(e_3,e_1,e_2)=+[/math].
[math]sgn(e_3,e_2,e_1)=-[/math].

Show Proof

In each case the problem is whether one can continuously pass from [math](e_1,e_2,e_3)[/math] to the basis in statement, and the computations can be done as follows:

(1) In three of the cases under investigation, namely (2,3,6), one of the vectors is unchanged, and the other two are switched. Thus, we are more or less in 2 dimensions, and since the switch here clearly corresponds to [math]-[/math], the sign in these cases is [math]-[/math].

(2) As for the remaining three cases, namely (1,4,5), here the sign can only be [math]+[/math], since things must be 50-50 between [math]+[/math] and [math]-[/math], say by symmetry reasons. And this is indeed the case, because what we have here are rotations of the standard basis.

■

As already mentioned, we will be back to this later, with a general formula for the sign in 3 dimensions. This formula is quite complicated, the idea being that of making out of the [math]3\times3=9[/math] entries of our vectors a certain quantity, somewhat in the spirit of the one in Proposition 2.8, and then taking the sign of this quantity.

At the level of the general results now, we have:

Proposition

The orientation of a system of vectors changes as follows:

If we switch the sign of a vector, the associated sign switches.
If we permute two vectors, the associated sign switches as well.

Show Proof

Both these assertions are clear from the definition of the sign, because the two operations in question change the orientation of the system of vectors.

■

With the above notion in hand, we can now formulate:

Definition

The determinant of [math]v_1,\ldots,v_N\in\mathbb R^N[/math] is the signed volume

[[math]] \det(v_1\ldots v_N)=\pm vol \lt v_1,\ldots,v_N \gt [[/math]]

of the parallelepiped made by these vectors.

In other words, we are upgrading here Definition 2.4, by adding a sign to the quantity [math]{\rm det}^+[/math] constructed there, as to potentially reach to good additivity properties:

[[math]] \det(v_1\ldots v_N)=\pm {\rm det}^+(v_1\ldots v_N) [[/math]]

In relation with our original inversion problem for the square matrices, this upgrade does not change what we have so far, and we have the following statement:

Theorem

The quantity [math]\det[/math] that we constructed, regarded as a function of the corresponding square matrices, formed by column vectors,

[[math]] \det:M_N(\mathbb R)\to\mathbb R [[/math]]

has the property that a matrix [math]A\in M_N(\mathbb R)[/math] is invertible precisely when [math]\det(A)\neq 0[/math].

Show Proof

We know from Theorem 2.5 that a matrix [math]A\in M_N(\mathbb R)[/math] is invertible precisely when [math]{\rm det}^+(A)=|\det A|[/math] is strictly positive, and this gives the result.

■

In the matrix context, we will often use the symbol [math]|\,.\,|[/math] instead of [math]\det[/math]:

[[math]] |A|=\det A [[/math]]

Let us try now to compute the determinant. In 1 dimension we have of course the formula [math]\det(a)=a[/math], because the absolute value fits, and so does the sign:

[[math]] \det(a) =sgn(a)\times|a| =a [[/math]]

In 2 dimensions now, we have the following result:

Theorem

In [math]2[/math] dimensions we have the following formula,

[[math]] \begin{vmatrix}a&b\\ c&d\end{vmatrix}=ad-bc [[/math]]

with [math]|\,.\,|=\det[/math] being the determinant function constructed above.

Show Proof

According to our definition, to the computation in Theorem 2.6, and to sign formula from Proposition 2.8, the determinant of a [math]2\times2[/math] matrix is given by:

[[math]] \begin{eqnarray*} \det\begin{pmatrix}a&b\\ c&d\end{pmatrix} &=&sgn\left[\binom{a}{c},\binom{b}{d}\right]\times {\rm det}^+\begin{pmatrix}a&b\\ c&d\end{pmatrix}\\ &=&sgn\left[\binom{a}{c},\binom{b}{d}\right]\times|ad-bc|\\ &=&sgn(ad-bc)\times|ad-bc|\\ &=&ad-bc \end{eqnarray*} [[/math]]

Thus, we have obtained the formula in the statement.

■

2c. Basic properties

In order to discuss now arbitrary dimensions, we will need a number of theoretical results. Here is a first series of formulae, coming straight from the definitions:

Theorem

The determinant has the following properties:

When multiplying by scalars, the determinant gets multiplied as well:
[[math]] \det(\lambda_1v_1,\ldots,\lambda_Nv_N)=\lambda_1\ldots\lambda_N\det(v_1,\ldots,v_N) [[/math]]
When permuting two columns, the determinant changes the sign:
[[math]] \det(\ldots,u,\ldots,v,\ldots)=-\det(\ldots,v,\ldots,u,\ldots) [[/math]]
The determinant [math]\det(e_1,\ldots,e_N)[/math] of the standard basis of [math]\mathbb R^N[/math] is [math]1[/math].

Show Proof

All this is clear from definitions, as follows:

(1) This follows from definitions, and from Proposition 2.10 (1).

(2) This follows as well from definitions, and from Proposition 2.10 (2).

(3) This is clear from our definition of the determinant.

■

As an application of the above result, we have:

Theorem

The determinant of a diagonal matrix is given by:

[[math]] \begin{vmatrix} \lambda_1\\ &\ddots\\ &&\lambda_N\end{vmatrix}=\lambda_1\ldots\lambda_N [[/math]]

That is, we obtain the product of diagonal entries, or of eigenvalues.

Show Proof

The formula in the statement is clear by using the rules (1) and (3) in Theorem 2.14 above, which in matrix terms give:

[[math]] \begin{eqnarray*} \begin{vmatrix} \lambda_1\\ &\ddots\\ &&\lambda_N\end{vmatrix} &=&\lambda_1\ldots\lambda_N \begin{vmatrix} 1\\ &\ddots\\ &&1\end{vmatrix}\\ &=&\lambda_1\ldots\lambda_N \end{eqnarray*} [[/math]]

As for the last assertion, this is rather a remark.

■

The above result is very useful, and we will see in a moment that, more generally, the determinant of any diagonalizable matrix is the product of its eigenvalues.

In order to reach now to a more advanced theory, let us adopt the linear map point of view. In this setting, the definition of the determinant reformulates as follows:

Theorem

Given a linear map, written as [math]f(v)=Av[/math], its “inflation coefficient”, obtained as the signed volume of the image of the unit cube, is given by:

[[math]] I_f=\det A [[/math]]

More generally, [math]I_f[/math] is the inflation ratio of any parallelepiped in [math]\mathbb R^N[/math], via the transformation [math]f[/math]. In particular [math]f[/math] is invertible precisely when [math]\det A\neq0[/math].

Show Proof

The only non-trivial thing in all this is the fact that the inflation coefficient [math]I_f[/math], as defined above, is independent of the choice of the parallelepiped. But this is a generalization of the Thales theorem, which follows from the Thales theorem itself.

■

As a first application of the above linear map viewpoint, we have:

Theorem

We have the following formula, valid for any matrices [math]A,B[/math]:

[[math]] \det(AB)=\det A\cdot\det B [[/math]]

In particular, we have [math]\det(AB)=\det(BA)[/math].

Show Proof

The decomposition formula in the statement follows by using the associated linear maps, which multiply as follows:

[[math]] f_{AB}=f_Af_B [[/math]]

Indeed, when computing the determinant, by using the “inflation coefficient” viewpoint from Theorem 2.16, we obtain the same thing on both sides. As for the formula [math]\det(AB)=\det(BA)[/math], this is clear from the first formula, which is symmetric in [math]A,B[/math].

■

Getting back now to explicit computations, we have the following key result:

Theorem

The determinant of a diagonalizable matrix

[[math]] A\sim\begin{pmatrix} \lambda_1\\ &\ddots\\ &&\lambda_N\end{pmatrix} [[/math]]

is the product of its eigenvalues, [math]\det A=\lambda_1\ldots\lambda_N[/math].

Show Proof

We know that a diagonalizable matrix can be written in the form [math]A=PDP^{-1}[/math], with [math]D=diag(\lambda_1,\ldots,\lambda_N)[/math]. Now by using Theorem 2.17, we obtain:

[[math]] \begin{eqnarray*} \det A &=&\det(PDP^{-1})\\ &=&\det(DP^{-1}P)\\ &=&\det D\\ &=&\lambda_1\ldots\lambda_N \end{eqnarray*} [[/math]]

Thus, we are led to the formula in the statement.

■

Here is another important result, which is very useful for diagonalization:

Theorem

The eigenvalues of a matrix [math]A\in M_N(\mathbb R)[/math] are the roots of

[[math]] P(x)=\det(A-x1_N) [[/math]]

called characteristic polynomial of the matrix.

Show Proof

We have the following computation, using the fact that a linear map is bijective precisely when the determinant of the associated matrix is nonzero:

[[math]] \begin{eqnarray*} \exists v,Av=\lambda v &\iff&\exists v,(A-\lambda 1_N)v=0\\ &\iff&\det(A-\lambda 1_N)=0 \end{eqnarray*} [[/math]]

Thus, we are led to the conclusion in the statement.

■

Here are now some other computations, once again in arbitrary dimensions:

Proposition

We have the following results:

The determinant of an orthogonal matrix must be [math]\pm1[/math].
The determinant of a projection must be [math]0[/math] or [math]1[/math].

Show Proof

These are elementary results, the idea being as follows:

(1) Here the determinant must be indeed [math]\pm1[/math], because the orthogonal matrices map the unit cube to a copy of the unit cube.

(2) Here the determinant is in general 0, because the projections flatten the unit cube, unless we have the identity, where the determinant is 1.

■

In general now, at the theoretical level, we have the following key result:

Theorem

The determinant has the additivity property

[[math]] \det(\ldots,u+v,\ldots) =\det(\ldots,u,\ldots) +\det(\ldots,v,\ldots) [[/math]]

valid for any choice of the vectors involved.

Show Proof

This follows by doing some elementary geometry, in the spirit of the computations in the proof of Theorem 2.6, as follows:

(1) We can either use the Thales theorem, and then compute the volumes of all the parallelepipeds involved, by using basic algebraic formulae.

(2) Or we can solve the problem in “puzzle” style, the idea being to cut the big parallelepiped, and then recover the small ones, after some manipulations.

(3) We can do as well something hybrid, consisting in deforming the parallelepipeds involved, without changing their volumes, and then cutting and gluing.

■

As a basic application of the above result, we have:

Theorem

We have the following results:

The determinant of a diagonal matrix is the product of diagonal entries.
The same is true for the upper triangular matrices.
The same is true for the lower triangular matrices.

Show Proof

All this can be deduced by using our various general formulae, as follows:

(1) This is something that we already know, from Theorem 2.15.

(2) This follows by using Theorem 2.14 and Theorem 2.21, then (1), as follows:

[[math]] \begin{eqnarray*} \begin{vmatrix} \lambda_1&&&*\\ &\lambda_2\\ &&\ddots\\ 0&&&\lambda_N\end{vmatrix} &=&\begin{vmatrix} \lambda_1&0&&*\\ &\lambda_2\\ &&\ddots\\ 0&&&\lambda_N\end{vmatrix}\\ &&\vdots\\ &&\vdots\\ &=&\begin{vmatrix} \lambda_1&&&0\\ &\lambda_2\\ &&\ddots\\ 0&&&\lambda_N\end{vmatrix}\\ &=&\lambda_1\ldots\lambda_N \end{eqnarray*} [[/math]]

(3) This follows as well from Theorem 2.14 and Theorem 2.21, then (1), by proceeding this time from right to left, from the last column towards the first column.

■

We can see from the above that the rules in Theorem 2.14 and Theorem 2.21 are quite powerful, taken altogether. For future reference, let us record these rules:

Theorem

The determinant has the following properties:

When adding two columns, the determinants get added:
[[math]] \det(\ldots,u+v,\ldots) =\det(\ldots,u,\ldots) +\det(\ldots,v,\ldots) [[/math]]
When multiplying columns by scalars, the determinant gets multiplied:
[[math]] \det(\lambda_1v_1,\ldots,\lambda_Nv_N)=\lambda_1\ldots\lambda_N\det(v_1,\ldots,v_N) [[/math]]
When permuting two columns, the determinant changes the sign:
[[math]] \det(\ldots,u,\ldots,v,\ldots)=-\det(\ldots,v,\ldots,u,\ldots) [[/math]]
The determinant [math]\det(e_1,\ldots,e_N)[/math] of the standard basis of [math]\mathbb R^N[/math] is [math]1[/math].

Show Proof

This is something that we already know, which follows by putting together the various formulae from Theorem 2.14 and Theorem 2.21.

■

As an important theoretical result now, which will ultimately lead to an algebraic reformulation of the whole determinant problematics, we have:

Theorem

The determinant of square matrices is the unique map

[[math]] \det:M_N(\mathbb R)\to\mathbb R [[/math]]

satisfying the conditions in Theorem 2.23.

Show Proof

This can be done in two steps, as follows:

(1) Our first claim is that any map [math]\det':M_N(\mathbb R)\to\mathbb R[/math] satisfying the conditions in Theorem 2.23 must coincide with [math]\det[/math] on the upper triangular matrices. But this is clear from the proof of Theorem 2.22, which only uses the rules in Theorem 2.23.

(2) Our second claim is that we have [math]\det'=\det[/math], on all matrices. But this can be proved by putting the matrix in upper triangular form, by using operations on the columns, in the spirit of the manipulations from the proof of Theorem 2.22.

■

Here is now another important theoretical result:

Theorem

The determinant is subject to the row expansion formula

[[math]] \begin{eqnarray*} \begin{vmatrix}a_{11}&\ldots&a_{1N}\\ \vdots&&\vdots\\ a_{N1}&\ldots&a_{NN}\end{vmatrix} &=&a_{11}\begin{vmatrix}a_{22}&\ldots&a_{2N}\\ \vdots&&\vdots\\ a_{N2}&\ldots&a_{NN}\end{vmatrix} -a_{12}\begin{vmatrix}a_{21}&a_{23}&\ldots&a_{2N}\\ \vdots&\vdots&&\vdots\\ a_{N1}&a_{N3}&\ldots&a_{NN}\end{vmatrix}\\ &&+\ldots\ldots +(-1)^{N+1}a_{1N}\begin{vmatrix}a_{21}&\ldots&a_{2,N-1}\\ \vdots&&\vdots\\ a_{N1}&\ldots&a_{N,N-1}\end{vmatrix} \end{eqnarray*} [[/math]]

and this method fully computes it, by recurrence.

Show Proof

This follows from the fact that the formula in the statement produces a certain function [math]\det:M_N(\mathbb R)\to\mathbb R[/math], which has the 4 properties in Theorem 2.23.

■

We can expand as well over the columns, as follows:

Theorem

The determinant is subject to the column expansion formula

[[math]] \begin{eqnarray*} \begin{vmatrix}a_{11}&\ldots&a_{1N}\\ \vdots&&\vdots\\ a_{N1}&\ldots&a_{NN}\end{vmatrix} &=&a_{11}\begin{vmatrix}a_{22}&\ldots&a_{2N}\\ \vdots&&\vdots\\ a_{N2}&\ldots&a_{NN}\end{vmatrix} -a_{21}\begin{vmatrix}a_{12}&\ldots&a_{1N}\\ a_{32}&\ldots&a_{3N}\\ \vdots&&\vdots\\ a_{N2}&\ldots&a_{NN}\end{vmatrix}\\ &&+\ldots\ldots +(-1)^{N+1}a_{N1}\begin{vmatrix}a_{12}&\ldots&a_{1N}\\ \vdots&&\vdots\\ a_{N-1,2}&\ldots&a_{N-1,N}\end{vmatrix} \end{eqnarray*} [[/math]]

and this method fully computes it, by recurrence.

Show Proof

This follows by using the same argument as for the rows.

■

We can now complement Theorem 2.23 with a similar result for the rows:

Theorem

The determinant has the following properties:

When adding two rows, the determinants get added:
[[math]] \det\begin{pmatrix}\vdots\\ u+v\\ \vdots\end{pmatrix} =\det\begin{pmatrix}\vdots\\ u\\ \vdots\end{pmatrix} +\det\begin{pmatrix}\vdots \\ v\\ \vdots\end{pmatrix} [[/math]]
When multiplying row by scalars, the determinant gets multiplied:
[[math]] \det\begin{pmatrix}\lambda_1v_1\\ \vdots\\ \lambda_Nv_N\end{pmatrix} =\lambda_1\ldots\lambda_N\det\begin{pmatrix}v_1\\ \vdots\\ v_N\end{pmatrix} [[/math]]
When permuting two rows, the determinant changes the sign.

Show Proof

This follows indeed by using the using various formulae established above, and is best seen by using the column expansion formula from Theorem 2.26.

■

We can see from the above that the determinant is the subject to many interesting formulae, and that some of these formulae, when taken altogether, uniquely determine it. In all this, what is the most luminous is certainly the definition of the determinant as a volume. As for the second most luminous of our statements, this is Theorem 2.24, which is something a bit abstract, but both beautiful and useful. So, as a final theoretical statement now, here is an alternative reformulation of Theorem 2.24:

Theorem

The determinant of the systems of vectors

[[math]] \det:\mathbb R^N\times\ldots\times\mathbb R^N\to\mathbb R [[/math]]

is multilinear, alternate and unital, and unique with these properties.

Show Proof

This is a fancy reformulation of Theorem 2.24, with the various properties of [math]\det[/math] from the statement being those from Theorem 2.23.

■

As a conclusion to all this, we have now a full theory for the determinant, and we can freely use all the above results, definitions and theorems alike, and even start forgetting what is actually definition, and what is theorem.

2d. Sarrus and beyond

As a first application of the above methods, we can now prove:

Theorem

The determinant of the [math]3\times3[/math] matrices is given by

[[math]] \begin{vmatrix}a&b&c\\ d&e&f\\ g&h&i\end{vmatrix}=aei+bfg+cdh-ceg-bdi-afh [[/math]]

which can be memorized by using Sarrus' triangle method,

[[math]] \begin{eqnarray*} \det &=&\begin{pmatrix}*&&\\ &*&\\ &&*\end{pmatrix} +\begin{pmatrix}&*&\\ &&*\\ *&&\end{pmatrix} +\begin{pmatrix}&&*\\ *&&\\ &*&\end{pmatrix}\\ &-&\begin{pmatrix}&&*\\ &*&\\ *&&\end{pmatrix} +\begin{pmatrix}&*&\\ *&&\\ &&*\end{pmatrix} +\begin{pmatrix}*&&\\ &&*\\ &*&\end{pmatrix} \end{eqnarray*} [[/math]]

“triangles parallel to the diagonal, minus triangles parallel to the antidiagonal”.

Show Proof

Here is the computation, using Theorem 2.25:

[[math]] \begin{eqnarray*} \begin{vmatrix}a&b&c\\ d&e&f\\ g&h&i\end{vmatrix} &=&a\begin{vmatrix}e&f\\h&i\end{vmatrix} -b\begin{vmatrix}d&f\\g&i\end{vmatrix} +c\begin{vmatrix}d&e\\g&h\end{vmatrix}\\ &=&a(ei-fh)-b(di-fg)+c(dh-eg)\\ &=&aei-afh-bdi+bfg+cdh-ceg\\ &=&aei+bfg+cdh-ceg-bdi-afh \end{eqnarray*} [[/math]]

Thus, we obtain the formula in the statement.

■

As a first application, let us go back to the inversion problem for the [math]3\times3[/math] matrices, that we left open in the above. We can now solve this problem, as follows:

Theorem

The inverses of the [math]3\times3[/math] matrices are given by

[[math]] \begin{pmatrix}a&b&c\\ d&e&f\\ g&h&i\end{pmatrix}^{-1} =\frac{1}{D}\begin{pmatrix}ei-fh&ch-bi&bf-ce\\ fg-di&ai-cg&cd-af\\ dh-eg&bg-ah&ae-bd\end{pmatrix} [[/math]]

with [math]D[/math] being the determimant. When [math]D=0[/math], the matrix is not invertible.

Show Proof

We can use here the same method as for the [math]2\times 2[/math] matrices. To be more precise, in order for the matrix to be invertible, we must have:

[[math]] D\neq0 [[/math]]

The trick now is to look for solutions of the following problem:

[[math]] \begin{pmatrix}a&b&c\\ d&e&f\\ g&h&i\end{pmatrix} \begin{pmatrix}*&*&*\\ *&*&*\\ *&*&*\end{pmatrix} =\begin{pmatrix}D&0&0\\ 0&D&0\\ 0&0&D\end{pmatrix} [[/math]]

We know from Theorem 2.29 that the determinant is given by:

[[math]] D=aei+bfg+cdh-ceg-bdi-afh [[/math]]

But this leads, via some obvious choices, to the following solution:

[[math]] \begin{pmatrix}*&*&*\\ *&*&*\\ *&*&*\end{pmatrix} =\begin{pmatrix}ei-fh&ch-bi&bf-ce\\ fg-di&ai-cg&cd-af\\ dh-eg&bg-ah&ae-bd\end{pmatrix} [[/math]]

Thus, by rescaling, we obtain the formula in the statement.

■

In fact, we can now fully solve the inversion problem, as follows:

Theorem

The inverse of a square matrix, having nonzero determinant,

[[math]] A=\begin{pmatrix}a_{11}&\ldots&a_{1N}\\ \vdots&&\vdots\\ a_{N1}&\ldots&a_{NN}\end{pmatrix} [[/math]]

is given by the following formula,

[[math]] A^{-1}=\frac{1}{\det A} \begin{pmatrix} \det A^{(11)}&-\det A^{(21)}&\det A^{(31)}&\ldots\\ -\det A^{(12)}&\det A^{(22)}&-\det A^{(32)}&\ldots\\ \det A^{(13)}&-\det A^{(23)}&\det A^{(33)}&\ldots\\ \vdots&\vdots&\vdots& \end{pmatrix} [[/math]]

where [math]A^{(ij)}[/math] is the matrix [math]A[/math], with the [math]i[/math]-th row and [math]j[/math]-th column removed.

Show Proof

This follows indeed by using the row expansion formula from Theorem 2.25, which in terms of the matrix [math]A^{-1}[/math] in the statement reads [math]AA^{-1}=1[/math].

■

In practice, the above result leads to the following algorithm, which is quite easy to memorize, for computing the inverse:

(1) Delete rows and columns, and compute the corresponding determinants.

(2) Transpose, and add checkered signs.

(3) Divide by the determinant.

Observe that this generalizes our previous computations at [math]N=2,3[/math]. As an illustration, consider an arbitrary [math]2\times2[/math] matrix, written as follows:

[[math]] A=\begin{pmatrix}a&b\\ c&d\end{pmatrix} [[/math]]

By deleting rows and columns we obtain [math]1\times1[/math] matrices, and so the matrix formed by the determinants [math]\det(A^{(ij)})[/math] is as follows:

[[math]] M=\begin{pmatrix}d&c\\ b&a\end{pmatrix} [[/math]]

Now by transposing, adding checkered signs and dividing by [math]\det A[/math], we obtain:

[[math]] A^{-1}=\frac{1}{ad-bc}\begin{pmatrix}d&-b\\ -c&a\end{pmatrix} [[/math]]

Similarly, at [math]N=3[/math] what we obtain is the Sarrus formula, from Theorem 2.29.

As a new application now, let us record the following result, at [math]N=4[/math]:

Theorem

The determinant of the [math]4\times4[/math] matrices is given by

[[math]] \begin{eqnarray*} &&\begin{vmatrix}a_1&a_2&a_3&a_4\\ b_1&b_2&b_3&b_4\\ c_1&c_2&c_3&c_4\\ d_1&d_2&d_3&d_4\end{vmatrix}\\ &=&a_1b_2c_3d_4-a_1b_2c_4d_3-a_1b_3c_2d_4+a_1b_3c_4d_2+a_1b_4c_2d_3-a_1b_4c_3d_2\\ &-&a_2b_1c_3d_4+a_2b_1c_4d_3+a_2b_3c_1d_4-a_2b_3c_4d_1-a_2b_4c_1d_3+a_2b_4c_3d_1\\ &+&a_3b_1c_2d_4+a_3b_1c_4d_2-a_3b_2c_1d_4+a_3b_2c_4d_1+a_3b_4c_1d_2-a_3b_4c_2d_1\\ &-&a_4b_1c_2d_3+a_4b_1c_3d_2-a_4b_2c_1d_3-a_4b_2c_3d_1-a_4b_3c_1d_2+a_4b_3c_2d_1 \end{eqnarray*} [[/math]]

and the formula of the inverse is as follows, involving [math]16[/math] Sarrus determinants,

[[math]] A^{-1}=\frac{1}{\det A} \begin{pmatrix} \det A^{(11)}&-\det A^{(21)}&\det A^{(31)}&-\det A^{(41)}\\ -\det A^{(12)}&\det A^{(22)}&-\det A^{(32)}&\det A^{(42)}\\ \det A^{(13)}&-\det A^{(23)}&\det A^{(33)}&-\det A^{(43)}\\ -\det A^{(14)}&\det A^{(24)}&-\det A^{(34)}&\det A^{(44)} \end{pmatrix} [[/math]]

where [math]A^{(ij)}[/math] is the matrix [math]A[/math], with the [math]i[/math]-th row and [math]j[/math]-th column removed.

Show Proof

The formula for the determinant follows by developing over the first row, then by using the Sarrus formula, for each of the 4 smaller determinants which appear:

[[math]] \begin{eqnarray*} \begin{vmatrix}a_1&a_2&a_3&a_4\\ b_1&b_2&b_3&b_4\\ c_1&c_2&c_3&c_4\\ d_1&d_2&d_3&d_4\end{vmatrix} &=&a_1\begin{vmatrix}b_2&b_3&b_4\\ c_2&c_3&c_4\\ d_2&d_3&d_4\end{vmatrix} -a_2\begin{vmatrix}b_1&b_3&b_4\\ c_1&c_3&c_4\\ d_1&d_3&d_4\end{vmatrix}\\ &+&a_3\begin{vmatrix}b_1&b_2&b_4\\ c_1&c_2&c_4\\ d_1&d_2&d_4\end{vmatrix} -a_4\begin{vmatrix}b_1&b_2&b_3\\ c_1&c_2&c_3\\ d_1&d_2&d_3\end{vmatrix} \end{eqnarray*} [[/math]]

As for the formula of the inverse, this is something that we already know.

■

Let us discuss now the general formula of the determinant, at arbitrary values [math]N\in\mathbb N[/math] of the matrix size, generalizing those that we have at [math]N=2,3,4[/math]. We will need:

Definition

A permutation of [math]\{1,\ldots,N\}[/math] is a bijection, as follows:

[[math]] \sigma:\{1,\ldots,N\}\to\{1,\ldots,N\} [[/math]]

The set of such permutations is denoted [math]S_N[/math].

There are many possible notations for the permutations, the basic one consisting in writing the numbers [math]1,\ldots,N[/math], and below them, their permuted versions:

[[math]] \sigma=\begin{pmatrix} 1&2&3&4&5\\ 2&1&4&5&3 \end{pmatrix} [[/math]]

Another method, which is faster, is by denoting the permutations as diagrams, acting from top to bottom:

[[math]] \xymatrix@R=3mm@C=3.5mm{ &\ar@{-}[ddr]&\ar@{-}[ddl]&\ar@{-}[ddrr]&\ar@{-}[ddl]&\ar@{-}[ddl]\\ \sigma=\\ &&&&&} [[/math]]

Here are some basic properties of the permutations:

Theorem

The permutations have the following properties:

There are [math]N![/math] of them.
They are stable by composition, and inversion.

Show Proof

In order to construct a permutation [math]\sigma\in S_N[/math], we have:

-- [math]N[/math] choices for the value of [math]\sigma(N)[/math]. -- [math](N-1)[/math] choices for the value of [math]\sigma(N-1)[/math]. -- [math](N-2)[/math] choices for the value of [math]\sigma(N-2)[/math]. [math]\vdots[/math] -- and so on, up to 1 choice for the value of [math]\sigma(1)[/math].

Thus, we have [math]N![/math] choices, as claimed. As for the second assertion, this is clear.

■

We will need the following key result:

Theorem

The permutations have a signature function

[[math]] \varepsilon:S_N\to\{\pm1\} [[/math]]

which can be defined in the following equivalent ways:

As [math](-1)^c[/math], where [math]c[/math] is the number of inversions.
As [math](-1)^t[/math], where [math]t[/math] is the number of transpositions.
As [math](-1)^o[/math], where [math]o[/math] is the number of odd cycles.
As [math](-1)^x[/math], where [math]x[/math] is the number of crossings.
As the sign of the corresponding permuted basis of [math]\mathbb R^N[/math].

Show Proof

{{{4}}}

■

We can now formulate a key result, as follows:

Theorem

We have the following formula for the determinant,

[[math]] \det A=\sum_{\sigma\in S_N}\varepsilon(\sigma)A_{1\sigma(1)}\ldots A_{N\sigma(N)} [[/math]]

with the signature function being the one introduced above.

Show Proof

This follows by recurrence over [math]N\in\mathbb N[/math], as follows:

(1) When developing the determinant over the first column, we obtain a signed sum of [math]N[/math] determinants of size [math](N-1)\times(N-1)[/math]. But each of these determinants can be computed by developing over the first column too, and so on, and we are led to the conclusion that we have a formula as in the statement, with [math]\varepsilon(\sigma)\in\{-1,1\}[/math] being certain coefficients.

(2) But these latter coefficients [math]\varepsilon(\sigma)\in\{-1,1\}[/math] can only be the signatures of the corresponding permutations [math]\sigma\in S_N[/math], with this being something that can be viewed again by recurrence, with either of the definitions (1-5) in Theorem 2.35 for the signature.

■

The above result is something quite tricky, and in order to get familiar with it, there is nothing better than doing some computations. As a first, basic example, in 2 dimensions we recover the usual formula of the determinant, the details being as follows:

[[math]] \begin{eqnarray*} \begin{vmatrix}a&b\\ c&d\end{vmatrix} &=&\varepsilon(|\,|)\cdot ad+\varepsilon(\slash\hskip-2mm\backslash)\cdot cb\\ &=&1\cdot ad+(-1)\cdot cb\\ &=&ad-bc \end{eqnarray*} [[/math]]

In 3 dimensions now, we recover the Sarrus formula:

[[math]] \begin{vmatrix}a&b&c\\ d&e&f\\ g&h&i\end{vmatrix}=aei+bfg+cdh-ceg-bdi-afh [[/math]]

Observe that the triangles in the Sarrus formula correspond to the permutations of [math]\{1,2,3\}[/math], and their signs correspond to the signatures of these permutations:

[[math]] \begin{eqnarray*} \det &=&\begin{pmatrix}*&&\\ &*&\\ &&*\end{pmatrix} +\begin{pmatrix}&*&\\ &&*\\ *&&\end{pmatrix} +\begin{pmatrix}&&*\\ *&&\\ &*&\end{pmatrix}\\ &-&\begin{pmatrix}&&*\\ &*&\\ *&&\end{pmatrix} +\begin{pmatrix}&*&\\ *&&\\ &&*\end{pmatrix} +\begin{pmatrix}*&&\\ &&*\\ &*&\end{pmatrix} \end{eqnarray*} [[/math]]

Also, in 4 dimensions, we recover the formula that we already know, as follows:

Theorem

The determinant of the [math]4\times4[/math] matrices is given by

[[math]] \begin{eqnarray*} &&\begin{vmatrix}a_1&a_2&a_3&a_4\\ b_1&b_2&b_3&b_4\\ c_1&c_2&c_3&c_4\\ d_1&d_2&d_3&d_4\end{vmatrix}\\ &=&a_1b_2c_3d_4-a_1b_2c_4d_3-a_1b_3c_2d_4+a_1b_3c_4d_2+a_1b_4c_2d_3-a_1b_4c_3d_2\\ &-&a_2b_1c_3d_4+a_2b_1c_4d_3+a_2b_3c_1d_4-a_2b_3c_4d_1-a_2b_4c_1d_3+a_2b_4c_3d_1\\ &+&a_3b_1c_2d_4+a_3b_1c_4d_2-a_3b_2c_1d_4+a_3b_2c_4d_1+a_3b_4c_1d_2-a_3b_4c_2d_1\\ &-&a_4b_1c_2d_3+a_4b_1c_3d_2-a_4b_2c_1d_3-a_4b_2c_3d_1-a_4b_3c_1d_2+a_4b_3c_2d_1 \end{eqnarray*} [[/math]]

with the generic term being of the following form, with [math]\sigma\in S_4[/math],

[[math]] \pm a_{\sigma(1)}b_{\sigma(2)}c_{\sigma(3)}d_{\sigma(4)} [[/math]]

and with the sign being [math]\varepsilon(\sigma)[/math], computable by using Theorem 2.35.

Show Proof

We can indeed recover this formula as well as a particular case of Theorem 2.36. To be more precise, the permutations in the statement are listed according to the lexicographic order, and the computation of the corresponding signatures is something elementary, by using the various rules from Theorem 2.35.

■

As another application, we have the following key result:

Theorem

We have the formula

[[math]] \det A=\det A^t [[/math]]

valid for any square matrix [math]A[/math].

Show Proof

This follows from the formula in Theorem 2.36. Indeed, we have:

[[math]] \begin{eqnarray*} \det A^t &=&\sum_{\sigma\in S_N}\varepsilon(\sigma)(A^t)_{1\sigma(1)}\ldots(A^t)_{N\sigma(N)}\\ &=&\sum_{\sigma\in S_N}\varepsilon(\sigma)A_{\sigma(1)1}\ldots A_{\sigma(N)N}\\ &=&\sum_{\sigma\in S_N}\varepsilon(\sigma)A_{1\sigma^{-1}(1)}\ldots A_{N\sigma^{-1}(N)}\\ &=&\sum_{\sigma\in S_N}\varepsilon(\sigma^{-1})A_{1\sigma^{-1}(1)}\ldots A_{N\sigma^{-1}(N)}\\ &=&\sum_{\sigma\in S_N}\varepsilon(\sigma)A_{1\sigma(1)}\ldots A_{N\sigma(N)}\\ &=&\det A \end{eqnarray*} [[/math]]

Thus, we are led to the formula in the statement.

■

Good news, this is the end of the general theory that we wanted to develop. We have now in our bag all the needed techniques for computing the determinant.

Here is however a nice and important example of a determinant, whose computation uses some interesting new techniques, going beyond what has been said above:

Theorem

We have the Vandermonde determinant formula

[[math]] \begin{vmatrix} 1&1&1&\ldots\ldots&1\\ x_1&x_2&x_3&\ldots\ldots&x_N\\ x_1^2&x_2^2&x_3^2&\ldots\ldots&x_N\\ \vdots&\vdots&\vdots&&\vdots\\ \vdots&\vdots&\vdots&&\vdots\\ x_1^{N-1}&x_2^{N-1}&x_3^{N-1}&\ldots\ldots&x_N^{N-1} \end{vmatrix} =\prod_{i \lt j}(x_i-x_j) [[/math]]

valid for any [math]x_1,\ldots,x_N\in\mathbb R[/math].

Show Proof

By expanding over the columns, we see that the determinant in question, say [math]D[/math], is a polynomial in the variables [math]x_1,\ldots,x_N[/math], having degree [math]N-1[/math] in each variable. Now observe that when setting [math]x_i=x_j[/math], for some indices [math]i\neq j[/math], our matrix will have two identical columns, and so its determinant [math]D[/math] will vanish:

[[math]] x_i=x_j\implies D=0 [[/math]]

But this gives us the key to the computation of [math]D[/math]. Indeed, [math]D[/math] must be divisible by [math]x_i-x_j[/math] for any [math]i\neq j[/math], and so we must have a formula of the following type:

[[math]] D=c\prod_{i \lt j}(x_i-x_j) [[/math]]

Moreover, since the product on the right is, exactly as [math]D[/math] itself, a polynomial in the variables [math]x_1,\ldots,x_N[/math], having degree [math]N-1[/math] in each variable, we conclude that the quantity [math]c[/math] must be a constant, not depending on any of the variables [math]x_1,\ldots,x_N[/math]:

[[math]] c\in\mathbb R [[/math]]

In order to finish the computation, it remains to find the value of this constant [math]c[/math]. But this can be done for instance by recurrence, and we obtain:

[[math]] c=1 [[/math]]

Thus, we are led to the formula in the statement.

■

Getting back now to generalities, and to what we want to do with our linear algebra theory, now that we are experts in the computation of the determinant, we should investigate the next problem, namely the diagonalization one.

And here, we know from Theorem 2.19 that the eigenvalues of a matrix [math]A\in M_N(\mathbb R)[/math] appear as roots of the characteristic polynomial:

[[math]] P(x)=\det(A-x1_N) [[/math]]

Thus, with the determinant theory developed above, we can in principle compute these eigenvalues, and solve the diagonalization problem afterwards.

The problem, however, is that certain real matrices can have characteristic polynomials of type [math]P(x)=x^2+1[/math], and this suggests that these matrices might be not diagonalizable over [math]\mathbb R[/math], but be diagonalizable over [math]\mathbb C[/math] instead. And so, before getting into diagonalization problems, we must upgrade our theory, and talk about complex matrices. We will do this in the next chapter, and afterwards, we will go back to the diagonalization problem.

General references

Banica, Teo (2024). "Linear algebra and group theory". arXiv:2206.09283 [math.CO].