Complex matrices

3a. Complex numbers

We have seen that the study of the real matrices [math]A\in M_N(\mathbb R)[/math] suggests the use of the complex numbers. Indeed, even simple matrices like the [math]2\times2[/math] ones can, at least in a formal sense, have complex eigenvalues. In what follows we discuss the complex matrices [math]A\in M_N(\mathbb C)[/math]. We will see that the theory here is much more complete than in the real case. As an application, we will solve in this way problems left open in the real case.

Let us begin with the complex numbers. There is a lot of magic here, and we will carefully explain this material. Their definition is as follows:

Definition

The complex numbers are variables of the form

[[math]] x=a+ib [[/math]]

which add in the obvious way, and multiply according to the following rule:

[[math]] i^2=-1 [[/math]]

In other words, we consider variables as above, without bothering for the moment with their precise meaning. Now consider two such complex numbers:

[[math]] x=a+ib\quad,\quad y=c+id [[/math]]

The formula for the sum is then the obvious one, as follows:

[[math]] x+y=(a+c)+i(b+d) [[/math]]

As for the formula of the product, by using the rule [math]i^2=-1[/math], we obtain:

[[math]] \begin{eqnarray*} xy &=&(a+ib)(c+id)\\ &=&ac+iad+ibc+i^2bd\\ &=&ac+iad+ibc-bd\\ &=&(ac-bd)+i(ad+bc) \end{eqnarray*} [[/math]]

Thus, the complex numbers as introduced above are well-defined. The multiplication formula is of course quite tricky, and hard to memorize, but we will see later some alternative ways, which are more conceptual, for performing the multiplication.

The advantage of using the complex numbers comes from the fact that the equation [math]x^2=1[/math] has now a solution, [math]x=i[/math]. In fact, this equation has two solutions, namely:

[[math]] x=\pm i [[/math]]

This is of course very good news. More generally, we have the following result:

Theorem

The complex solutions of [math]ax^2+bx+c=0[/math] with [math]a,b,c\in\mathbb R[/math] are

[[math]] x_{1,2}=\frac{-b\pm\sqrt{b^2-4ac}}{2a} [[/math]]

with the square root of negative real numbers being defined as:

[[math]] \sqrt{-m}=\pm i\sqrt{m} [[/math]]

Show Proof

We can write our equation in the following way:

[[math]] \begin{eqnarray*} ax^2+bx+c=0 &\iff&x^2+\frac{b}{a}x+\frac{c}{a}=0\\ &\iff&\left(x+\frac{b}{2a}\right)^2-\frac{b^2}{4a^2}+\frac{c}{a}=0\\ &\iff&\left(x+\frac{b}{2a}\right)^2=\frac{b^2-4ac}{4a^2}\\ &\iff&x+\frac{b}{2a}=\pm\frac{\sqrt{b^2-4ac}}{2a} \end{eqnarray*} [[/math]]

Thus, we are led to the conclusion in the statement.

■

We will be back later to this, with generalizations. Getting back now to Definition 3.1 as it is, we can represent the complex numbers in the plane, as follows:

Proposition

The complex numbers, written as usual

[[math]] x=a+ib [[/math]]

can be represented in the plane, according to the following identification:

[[math]] x=\binom{a}{b} [[/math]]

With this convention, the sum of complex numbers is the usual sum of vectors.

Show Proof

Consider indeed two arbitrary complex numbers:

[[math]] x=a+ib\quad,\quad y=c+id [[/math]]

Their sum is then by definition the following complex number:

[[math]] x+y=(a+c)+i(b+d) [[/math]]

Now let us represent [math]x,y[/math] in the plane, as in the statement:

[[math]] x=\binom{a}{b}\quad,\quad y=\binom{c}{d} [[/math]]

In this picture, their sum is given by the following formula:

[[math]] x+y=\binom{a+c}{b+d} [[/math]]

But this is indeed the vector corresponding to [math]x+y[/math], so we are done.

■

Observe that in the above picture, the real numbers correspond to the numbers on the [math]Ox[/math] axis. As for the purely imaginary numbers, these lie on the [math]Oy[/math] axis, with:

[[math]] i=\binom{0}{1} [[/math]]

All this is very nice, but in order to understand now the multiplication, we must do something more complicated, namely using polar coordinates. Let us start with:

Definition

The complex numbers [math]x=a+ib[/math] can be written in polar coordinates,

[[math]] x=r(\cos t+i\sin t) [[/math]]

with the connecting formulae being

[[math]] a=r\cos t\quad,\quad b=r\sin t [[/math]]

and in the other sense being

[[math]] r=\sqrt{a^2+b^2}\quad,\quad \tan t=b/a [[/math]]

and with [math]r,t[/math] being called modulus, and argument.

There is a clear relation here with the vector notation from Proposition 3.3, because [math]r[/math] is the length of the vector, and [math]t[/math] is the angle made by the vector with the [math]Ox[/math] axis. As a basic example here, the number [math]i[/math] takes the following form:

[[math]] i=\cos\left(\frac{\pi}{2}\right)+i\sin\left(\frac{\pi}{2}\right) [[/math]]

The point now is that in polar coordinates, the multiplication formula for the complex numbers, which was so far something quite opaque, takes a very simple form:

Theorem

Two complex numbers written in polar coordinates,

[[math]] x=r(\cos s+i\sin s)\quad,\quad y=p(\cos t+i\sin t) [[/math]]

multiply according to the following formula:

[[math]] xy=rp(\cos(s+t)+i\sin(s+t)) [[/math]]

In other words, the moduli multiply, and the arguments sum up.

Show Proof

This can be proved by doing some trigonometry, as follows:

(1) Recall first the definition of [math]\sin,\cos[/math], as being the sides of a right triangle having angle [math]t[/math]. Our first claim is that we have the Pythagoras' theorem, namely:

[[math]] \sin^2t+\cos^2t=1 [[/math]]

But this comes from the following well-known, and genius picture, with the edges of the outer and inner square being respectively [math]\sin t+\cos t[/math] and 1:

[[math]] \xymatrix@R=13pt@C=13pt{ \circ\ar@{-}[r]\ar@{-}[dd]&\circ\ar@{-}[rr]\ar@{-}[drr]&&\circ\ar@{-}[d]\\ &&&\circ\ar@{-}[dd]&\\ \circ\ar@{-}[d]\ar@{-}[uur]\ar@{-}[drr]&&&&\\ \circ\ar@{-}[rr]&&\circ\ar@{-}[r]\ar@{-}[uur]&\circ } [[/math]]

Indeed, when computing the area of the outer square, we obtain:

[[math]] (\sin t+\cos t)^2=1+4\times\frac{\sin t\cos t}{2} [[/math]]

Now when expanding we obtain [math]\sin^2t+\cos^2t=1[/math], as claimed.

(2) Next in line, our claim is that we have the following formulae:

[[math]] \sin(s+t)=\cos s\sin t+\sin s\cos t [[/math]]

[[math]] \cos(s+t)=\cos s\cos t-\sin s\sin t [[/math]]

To be more precise, let us first establish this formula. In order to do so, consider the following picture, consisting of a length 1 line segment, with angles [math]s,t[/math] drawn on each side, and with everything being completed, and lengths computed, as indicated:

[[math]] \xymatrix@R=15pt@C=70pt{ &\circ\ar@{-}[d]^{\sin s/\cos s}\\ \circ\ar@{-}[ur]^{1/\cos s}\ar@{-}[r]_1\ar@{-}[ddr]_{1/\cos t}&\circ\ar@{-}[dd]^{\sin t/\cos t}\\ \\ &\circ } [[/math]]

Now let us compute the area of the big triangle, or rather the double of that area. We can do this in two ways, either directly, with a formula involving [math]\sin(s+t)[/math], or by using the two small triangles, involving functions of [math]s,t[/math]. We obtain in this way:

[[math]] \frac{1}{\cos s}\cdot\frac{1}{\cos t}\cdot\sin(s+t)=\frac{\sin s}{\cos s}\cdot 1+\frac{\sin t}{\cos t}\cdot 1 [[/math]]

But this gives the formula for [math]\sin(s+t)[/math] claimed above. Now by using this formula for [math]\sin(s+t)[/math] we can deduce as well the formula for [math]\cos(s+t)[/math], as follows:

[[math]] \begin{eqnarray*} \cos(s+t) &=&\sin\left(\frac{\pi}{2}-s-t\right)\\ &=&\sin\left[\left(\frac{\pi}{2}-s\right)+(-t)\right]\\ &=&\sin\left(\frac{\pi}{2}-s\right)\cos(-t)+\cos\left(\frac{\pi}{2}-s\right)\sin(-t)\\ &=&\cos s\cos t-\sin s\sin t \end{eqnarray*} [[/math]]

(3) Now back to complex numbers, we want to prove that [math]x=r(\cos s+i\sin s)[/math] and [math]y=p(\cos t+i\sin t)[/math] multiply according to the following formula:

[[math]] xy=rp(\cos(s+t)+i\sin(s+t)) [[/math]]

We can assume that we have [math]r=p=1[/math], by dividing everything by these numbers. Now with this assumption made, we have the following computation:

[[math]] \begin{eqnarray*} xy &=&(\cos s+i\sin s)(\cos t+i\sin t)\\ &=&(\cos s\cos t-\sin s\sin t)+i(\cos s\sin t+\sin s\cos t)\\ &=&\cos(s+t)+i\sin(s+t) \end{eqnarray*} [[/math]]

Thus, we are led to the conclusion in the statement.

■

The above result, which was based on some non-trivial trigonometry, is quite powerful. As a basic application of it, we can now compute powers, as follows:

Theorem

The powers of a complex number, written in polar form,

[[math]] x=r(\cos t+i\sin t) [[/math]]

are given by the following formula, valid for any exponent [math]k\in\mathbb N[/math]:

[[math]] x^k=r^k(\cos kt+i\sin kt) [[/math]]

Moreover, this formula holds in fact for any [math]k\in\mathbb Z[/math], and even for any [math]k\in\mathbb Q[/math].

Show Proof

Given a complex number [math]x[/math], written in polar form as above, and an exponent [math]k\in\mathbb N[/math], we have indeed the following computation, with [math]k[/math] terms everywhere:

[[math]] \begin{eqnarray*} x^k &=&x\ldots x\\ &=&r(\cos t+i\sin t)\ldots r(\cos t+i\sin t)\\ &=&r\ldots r([\cos(t+\ldots+t)+i\sin(t+\ldots+t))\\ &=&r^k(\cos kt+i\sin kt) \end{eqnarray*} [[/math]]

Thus, we are done with the case [math]k\in\mathbb N[/math]. Regarding now the generalization to the case [math]k\in\mathbb Z[/math], it is enough here to do the verification for [math]k=-1[/math], where the formula is:

[[math]] x^{-1}=r^{-1}(\cos(-t)+i\sin(-t)) [[/math]]

But this number [math]x^{-1}[/math] is indeed the inverse of [math]x[/math], because:

[[math]] \begin{eqnarray*} xx^{-1} &=&r(\cos t+i\sin t)\cdot r^{-1}(\cos(-t)+i\sin(-t))\\ &=&\cos(t-t)+i\sin(t-t)\\ &=&\cos 0+i\sin 0\\ &=&1 \end{eqnarray*} [[/math]]

Finally, regarding the generalization to the case [math]k\in\mathbb Q[/math], it is enough to do the verification for exponents of type [math]k=1/n[/math], with [math]n\in\mathbb N[/math]. The claim here is that:

[[math]] x^{1/n}=r^{1/n}\left[\cos\left(\frac{t}{n}\right)+i\sin\left(\frac{t}{n}\right)\right] [[/math]]

In order to prove this, let us compute the [math]n[/math]-th power of this number. We can use the power formula for the exponent [math]n\in\mathbb N[/math], that we already established, and we obtain:

[[math]] \begin{eqnarray*} (x^{1/n})^n &=&(r^{1/n})^n\left[\cos\left(n\cdot\frac{t}{n}\right)+i\sin\left(n\cdot\frac{t}{n}\right)\right]\\ &=&r(\cos t+i\sin t)\\ &=&x \end{eqnarray*} [[/math]]

Thus, we have indeed a [math]n[/math]-th root of [math]x[/math], and our proof is now complete.

■

We should mention that there is a bit of ambiguity in the above, in the case of the exponents [math]k\in\mathbb Q[/math], due to the fact that the square roots, and the higher roots as well, can take multiple values, in the complex number setting. We will be back to this.

Let us discuss now the final and most convenient writing of the complex numbers, which is a well-known variation on the polar writing, as follows:

[[math]] x=re^{it} [[/math]]

In what follows we will not really need the true power of this formula, which is of analytic nature, due to occurrence of the number [math]e[/math]. However, we would like to use the notation [math]x=re^{it}[/math], as everyone does, among others because it simplifies the writing. The point indeed with the above formula comes from the following deep result:

Theorem

We have the following formula, valid for any [math]t\in\mathbb R[/math],

[[math]] e^{it}=\cos t+i\sin t [[/math]]

where [math]e=2.7182\ldots[/math] is the usual constant from analysis.

Show Proof

In order to prove such a result, we must first recall what [math]e[/math] is, and what [math]e^x[/math] is. One way of viewing things is that [math]e^x[/math] is the unique function satisfying:

[[math]] (e^x)'=e^x\quad,\quad e^0=1 [[/math]]

Then, we can set [math]e=e^1[/math], and then prove that [math]e^x[/math] equals indeed [math]e[/math] to the power [math]x[/math]. This is a bit abstract, but is convenient for our purposes. Indeed, the solution to the above derivative problem is easy to work out, as a series, by recurrence, the answer being:

[[math]] e^x=\sum_{k=0}^\infty\frac{x^k}{k!} [[/math]]

Now let us plug [math]x=it[/math] in this formula. We obtain the following formula:

[[math]] \begin{eqnarray*} e^{it} &=&\sum_{k=0}^\infty\frac{(it)^k}{k!}\\ &=&\sum_{k=2l}\frac{(it)^k}{k!}+\sum_{k=2l+1}\frac{(it)^k}{k!}\\ &=&\sum_{l=0}^\infty(-1)^l\frac{t^{2l}}{(2l)!}+i\sum_{l=0}^\infty(-1)^l\frac{t^{2l+1}}{(2l+1)!} \end{eqnarray*} [[/math]]

Our claim now, which will complete the proof, is that we have:

[[math]] \cos t=\sum_{l=0}^\infty(-1)^l\frac{t^{2l}}{(2l)!}\quad,\quad \sin t=\sum_{l=0}^\infty(-1)^l\frac{t^{2l+1}}{(2l+1)!} [[/math]]

In order to prove this claim, let us compute the Taylor series of [math]\cos[/math] and [math]\sin[/math]. By using the formulae for sums of angles, used in the proof of Theorem 3.5, we have:

[[math]] \sin'=\cos\quad,\quad \cos'=-\sin [[/math]]

Thus, we know how to differentiate [math]\sin[/math] and [math]\cos[/math], once, then twice, then as many times as we want to, and with this we can compute the corresponding Taylor series, the answers being those given above. Now by putting everything together, we have:

[[math]] e^{it}=\cos t+i\sin t [[/math]]

Thus, we are led to the conclusion in the statement.

■

All this was quite brief, but we will be back to it with details in chapter 5 below, when doing analysis. For the moment, let us just enjoy all this. We first have:

Theorem

We have the following formula,

[[math]] e^{\pi i}=-1 [[/math]]

and we have [math]E=mc^2[/math] as well.

Show Proof

We have two assertions here, the idea being as follows:

(1) The first formula, [math]e^{\pi i}=-1[/math], which is actually the main formula in mathematics, comes from Theorem 3.7, by setting [math]t=\pi[/math]. Indeed, we obtain:

[[math]] \begin{eqnarray*} e^{\pi i} &=&\cos\pi+i\sin\pi\\ &=&-1+i\cdot 0\\ &=&-1 \end{eqnarray*} [[/math]]

(2) As for [math]E=mc^2[/math], which is the main formula in physics, this is something deep as well. Although we will not really need it here, we recommend learning it too, for symmetry reasons between math and physics, say from Feynman ^[1], ^[2], ^[3].

■

Now back to our [math]x=re^{it}[/math] objectives, with the above theory in hand we can indeed use from now on this notation, the complete statement being as follows:

Theorem

The complex numbers [math]x=a+ib[/math] can be written in polar coordinates,

[[math]] x=re^{it} [[/math]]

with the connecting formulae being

[[math]] a=r\cos t\quad,\quad b=r\sin t [[/math]]

and in the other sense being

[[math]] r=\sqrt{a^2+b^2}\quad,\quad \tan t=b/a [[/math]]

and with [math]r,t[/math] being called modulus, and argument.

Show Proof

This is just a reformulation of Definition 3.4, by using the formula [math]e^{it}=\cos t+i\sin t[/math] from Theorem 3.7, and multiplying everything by [math]r[/math].

■

We can now go back to the basics, and we have the following result:

Theorem

In polar coordinates, the complex numbers multiply as

[[math]] re^{is}\cdot pe^{it}=rp\,e^{i(s+t)} [[/math]]

with the arguments [math]s,t[/math] being taken modulo [math]2\pi[/math].

Show Proof

This is something that we already know, from Theorem 3.5, reformulated by using the notations from Theorem 3.9. Observe that this follows as well directly, from the fact that we have [math]e^{a+b}=e^ae^b[/math], that we know from analysis.

■

We can now investigate more complicated operations, as follows:

Theorem

We have the following operations on the complex numbers:

Inversion: [math](re^{it})^{-1}=r^{-1}e^{-it}[/math].
Square roots: [math]\sqrt{re^{it}}=\pm\sqrt{r}e^{it/2}[/math].
Powers: [math](re^{it})^a=r^ae^{ita}[/math].

Show Proof

This is something that we already know, from Theorem 3.6, but we can now discuss all this, from a more conceptual viewpoint, the idea being as follows:

(1) We have indeed the following computation, using Theorem 3.10:

[[math]] (re^{it})(r^{-1}e^{-it}) =rr^{-1}\cdot e^{i(t-t)} =1 [[/math]]

(2) Once again by using Theorem 3.10, we have:

[[math]] (\pm\sqrt{r}e^{it/2})^2 =(\sqrt{r})^2e^{i(t/2+t/2)} =re^{it} [[/math]]

(3) Given an arbitrary number [math]a\in\mathbb R[/math], we can define, as stated:

[[math]] (re^{it})^a=r^ae^{ita} [[/math]]

Due to Theorem 3.10, this operation [math]x\to x^a[/math] is indeed the correct one.

■

We can now go back to the degree 2 equations, and we have:

Theorem

The complex solutions of [math]ax^2+bx+c=0[/math] with [math]a,b,c\in\mathbb C[/math] are

[[math]] x_{1,2}=\frac{-b\pm\sqrt{b^2-4ac}}{2a} [[/math]]

with the square root of complex numbers being defined as above.

Show Proof

This is clear, the computations being the same as in the real case. To be more precise, our degree 2 equation can be written as follows:

[[math]] \left(x+\frac{b}{2a}\right)^2=\frac{b^2-4ac}{4a^2} [[/math]]

Now since we know from Theorem 3.11 (2) that any complex number has a square root, we are led to the conclusion in the statement.

■

More generally now, we can prove that any polynomial equation, of arbitrary degree [math]N\in\mathbb N[/math], has exactly [math]N[/math] complex solutions, counted with multiplicities:

Theorem

Any polynomial [math]P\in\mathbb C[X][/math] decomposes as

[[math]] P=c(X-a_1)\ldots (X-a_N) [[/math]]

with [math]c\in\mathbb C[/math] and with [math]a_1,\ldots,a_N\in\mathbb C[/math].

Show Proof

The problem is that of proving that our polynomial has at least one root, because afterwards we can proceed by recurrence. We prove this by contradiction. So, assume that [math]P[/math] has no roots, and pick a number [math]z\in\mathbb C[/math] where [math]|P|[/math] attains its minimum:

[[math]] |P(z)|=\min_{x\in\mathbb C}|P(x)| \gt 0 [[/math]]

Since [math]Q(t)=P(z+t)-P(z)[/math] is a polynomial which vanishes at [math]t=0[/math], this polynomial must be of the form [math]ct^k[/math] + higher terms, with [math]c\neq0[/math], and with [math]k\geq1[/math] being an integer. We obtain from this that, with [math]t\in\mathbb C[/math] small, we have the following estimate:

[[math]] P(z+t)\simeq P(z)+ct^k [[/math]]

Now let us write [math]t=rw[/math], with [math]r \gt 0[/math] small, and with [math]|w|=1[/math]. Our estimate becomes:

[[math]] P(z+rw)\simeq P(z)+cr^kw^k [[/math]]

Now recall that we have assumed [math]P(z)\neq0[/math]. We can therefore choose [math]w\in\mathbb T[/math] such that [math]cw^k[/math] points in the opposite direction to that of [math]P(z)[/math], and we obtain in this way:

[[math]] \begin{eqnarray*} |P(z+rw)| &\simeq&|P(z)+cr^kw^k|\\ &=&|P(z)|(1-|c|r^k) \end{eqnarray*} [[/math]]

Now by choosing [math]r \gt 0[/math] small enough, as for the error in the first estimate to be small, and overcame by the negative quantity [math]-|c|r^k[/math], we obtain from this:

[[math]] |P(z+rw)| \lt |P(z)| [[/math]]

But this contradicts our definition of [math]z\in\mathbb C[/math], as a point where [math]|P|[/math] attains its minimum. Thus [math]P[/math] has a root, and by recurrence it has [math]N[/math] roots, as stated.

■

All this is very nice, and we will see applications in a moment. As a last topic now regarding the complex numbers, we have:

Theorem

The equation [math]x^N=1[/math] has [math]N[/math] complex solutions, namely

[[math]] \left\{w^k\Big|k=0,1,\ldots,N-1\right\}\quad,\quad w=e^{2\pi i/N} [[/math]]

which are called roots of unity of order [math]N[/math].

Show Proof

This follows from Theorem 3.10. Indeed, with [math]x=re^{it}[/math] our equation reads:

[[math]] r^Ne^{itN}=1 [[/math]]

Thus [math]r=1[/math], and [math]t\in[0,2\pi)[/math] must be a multiple of [math]2\pi/N[/math], as stated.

■

As an illustration here, the roots of unity of small order, along with some of their basic properties, which are very useful for computations, are as follows:

\underline{[math]N=1[/math]}. Here the unique root of unity is 1.

\underline{[math]N=2[/math]}. Here we have two roots of unity, namely 1 and [math]-1[/math].

\underline{[math]N=3[/math]}. Here we have 1, then [math]w=e^{2\pi i/3}[/math], and then [math]w^2=\bar{w}=e^{4\pi i/3}[/math].

\underline{[math]N=4[/math]}. Here the roots of unity, read as usual counterclockwise, are [math]1,i,-1,-i[/math].

\underline{[math]N=5[/math]}. Here, with [math]w=e^{2\pi i/5}[/math], the roots of unity are [math]1,w,w^2,w^3,w^4[/math].

\underline{[math]N=6[/math]}. Here a useful alternative writing is [math]\{\pm1,\pm w,\pm w^2\}[/math], with [math]w=e^{2\pi i/3}[/math].

The roots of unity are very useful variables, and have many interesting properties. As a first application, we can now solve the ambiguity questions related to the extraction of [math]N[/math]-th roots, from Theorem 3.6 and Theorem 3.11, the statement being as follows:

Theorem

Any nonzero complex number, written as

[[math]] x=re^{it} [[/math]]

has exactly [math]N[/math] roots of order [math]N[/math], which appear as

[[math]] y=r^{1/N}e^{it/N} [[/math]]

multiplied by the [math]N[/math] roots of unity of order [math]N[/math].

Show Proof

We must solve the equation [math]z^N=x[/math], over the complex numbers. Since the number [math]y[/math] in the statement clearly satisfies [math]y^N=x[/math], our equation is equivalent to:

[[math]] z^N=y^N [[/math]]

Now observe that we can write this equation as follows:

[[math]] \left(\frac{z}{y}\right)^N=1 [[/math]]

We conclude that the solutions [math]z[/math] appear by multiplying [math]y[/math] by the solutions of [math]t^N=1[/math], which are the [math]N[/math]-th roots of unity, as claimed.

■

The roots of unity appear in connection with many other questions, and there are many useful formulae relating them, which are good to know, as for instance:

Theorem

The roots of unity, [math]\{w^k\}[/math] with [math]w=e^{2\pi i/N}[/math], have the property

[[math]] \sum_{k=0}^{N-1}(w^k)^s=N\delta_{N|s} [[/math]]

for any exponent [math]s\in\mathbb N[/math], where on the right we have a Kronecker symbol.

Show Proof

The numbers in the statement, when written more conveniently as [math](w^s)^k[/math] with [math]k=0,\ldots,N-1[/math], form a certain regular polygon in the plane [math]P_s[/math]. Thus, if we denote by [math]C_s[/math] the barycenter of this polygon, we have the following formula:

[[math]] \frac{1}{N}\sum_{k=0}^{N-1}w^{ks}=C_s [[/math]]

Now observe that in the case [math]N\slash\hskip-1.6mm|\,s[/math] our polygon [math]P_s[/math] is non-degenerate, circling around the unit circle, and having center [math]C_s=0[/math]. As for the case [math]N|s[/math], here the polygon is degenerate, lying at 1, and having center [math]C_s=1[/math]. Thus, we have the following formula:

[[math]] C_s=\delta_{N|s} [[/math]]

Thus, we obtain the formula in the statement.

■

Before moving ahead, let us recommend some reading. The mathematics in this book requires a deep commitment to the complex numbers. And here, while there are surely plenty of good logical reasons for using [math]\mathbb C[/math] instead of [math]\mathbb R[/math], such as those evoked above, regarding real polynomials which can have complex roots, or real matrices which can have complex eigenvalues, and so on, things remain a bit abstract.

In order to overcome this hurdle, and get to love complex numbers, nothing better than learning some physics. Indeed, physics is all about waves. Water waves of course, but also light and other electromagnetic waves, and also elementary particles, which are treated like waves too. And all these waves naturally live over [math]\mathbb C[/math]. And even more, you will learn in fact that quantum mechanics itself lives over [math]\mathbb C[/math], and so farewell [math]\mathbb R[/math].

The standard books for learning physics are those of Feynman ^[1], ^[2], ^[3]. At a more advanced level, equally fun and entertaining, you have the books of Griffiths ^[4], ^[5], ^[6]. There are also plenty of more popular books that you can start with, such as Griffiths again ^[7], or Huang ^[8], or Kumar ^[9]. Up to you here, and in the hope that you will follow my advice. There ain't such thing as mathematics without physics, and the answer to any mathematical question that you might have, technical or philosophical, now and in the future, always, but really always, believe me, lies in physics.

3b. Linear maps

Back now to linear algebra, our first task will be that of extending the results that we know, from the real case, to the complex case. We first have:

Theorem

The linear maps [math]f:\mathbb C^N\to\mathbb C^M[/math] are the maps of the form

[[math]] f(x)=Ax [[/math]]

with [math]A[/math] being a rectangular matrix, [math]A\in M_{M\times N}(\mathbb C)[/math].

Show Proof

This follows as in the real case. Indeed, [math]f:\mathbb C^N\to\mathbb C^M[/math] must send a vector [math]x\in\mathbb C^N[/math] to a certain vector [math]f(x)\in\mathbb C^M[/math], all whose components are linear combinations of the components of [math]x[/math]. Thus, we can write, for certain complex numbers [math]a_{ij}\in\mathbb C[/math]:

[[math]] f\begin{pmatrix} x_1\\ \vdots\\ \vdots\\ x_N \end{pmatrix} =\begin{pmatrix} a_{11}x_1+\ldots+a_{1N}x_N\\ \vdots\\ \vdots\\ a_{M1}x_1+\ldots+a_{MN}x_N \end{pmatrix} [[/math]]

But the parameters [math]a_{ij}\in\mathbb C[/math] can be regarded as being the entries of a matrix:

[[math]] A=(a_{ij})\in M_{M\times N}(\mathbb C) [[/math]]

Now with the usual convention for the rectangular matrix multiplication, exactly as in the real case, the above formula is precisely the one in the statement.

■

We have as well the following result:

Theorem

A linear map [math]f:\mathbb C^N\to\mathbb C^M[/math], written as

[[math]] f(v)=Av [[/math]]

is invertible precisely when [math]A[/math] is invertible, and in this case we have:

[[math]] f^{-1}(v)=A^{-1}v [[/math]]

Show Proof

As in the real case, with the convention [math]f_A(v)=Av[/math], we have the following multiplication formula for such linear maps:

[[math]] f_Af_B(v)=f_{AB}(v) [[/math]]

But this shows that [math]f_Af_B=1[/math] is equivalent to [math]AB=1[/math], as desired.

■

With respect to the real case, some subtleties appear at the level of the scalar products, isometries and projections. The basic theory here is as follows:

Theorem

Consider the usual scalar product [math] \lt x,y \gt =\sum_ix_i\bar{y}_i[/math] on [math]\mathbb C^N[/math].

We have the following formula, where [math](A^*)_{ij}=\bar{A}_{ji}[/math] is the adjoint matrix:
[[math]] \lt Ax,y \gt = \lt x,A^*y \gt [[/math]]
A linear map [math]f:\mathbb C^N\to\mathbb C^N[/math], written as [math]f(x)=Ux[/math] with [math]U\in M_N(\mathbb C)[/math], is an isometry precisely when [math]U[/math] is unitary, in the sense that:
[[math]] U^*=U^{-1} [[/math]]
A linear map [math]f:\mathbb C^N\to\mathbb C^N[/math], written as [math]f(x)=Px[/math] with [math]P\in M_N(\mathbb C)[/math], is a porojection precisely when [math]P[/math] is projection, in the sense that:
[[math]] P=P^2=P^* [[/math]]
The formula for the rank [math]1[/math] projections is as follows:
[[math]] P_x=\frac{1}{||x||^2}(x_i\bar{x}_j)_{ij} [[/math]]

Show Proof

This follows as in the real case, by performing modifications where needed:

(1) By using the standard basis of [math]\mathbb C^N[/math], we want to prove that for any [math]i,j[/math] we have:

[[math]] \lt Ae_j,e_i \gt = \lt e_j,A^*e_i \gt [[/math]]

The scalar product being now antisymmetric, this is the same as proving that:

[[math]] \lt Ae_j,e_i \gt =\overline{ \lt A^*e_i,e_j \gt } [[/math]]

On the other hand, for any matrix [math]M[/math] we have the following formula:

[[math]] M_{ij}= \lt Me_j,e_i \gt [[/math]]

Thus, the formula to be proved simply reads:

[[math]] A_{ij}=\overline{(A^*)_{ji}} [[/math]]

But this is precisely the definition of [math]A^*[/math], and we are done.

(2) Let first recall that we can pass from scalar products to distances, as follows:

[[math]] ||x||=\sqrt{ \lt x,x \gt } [[/math]]

Conversely, we can compute the scalar products in terms of distances, by using the complex polarization identity, which is as follows:

[[math]] \begin{eqnarray*} &&||x+y||^2-||x-y||^2+i||x+iy||^2-i||x-iy||^2\\ &=&||x||^2+||y||^2-||x||^2-||y||^2+i||x||^2+i||y||^2-i||x||^2-i||y||^2\\ &&+2Re( \lt x,y \gt )+2Re( \lt x,y \gt )+2iIm( \lt x,y \gt )+2iIm( \lt x,y \gt )\\ &=&4 \lt x,y \gt \end{eqnarray*} [[/math]]

Now given a matrix [math]U\in M_N(\mathbb C)[/math], we have the following equivalences, with the first one coming from the above identities, and with the other ones being clear:

[[math]] \begin{eqnarray*} ||Ux||=||x|| &\iff& \lt Ux,Uy \gt = \lt x,y \gt \\ &\iff& \lt x,U^*Uy \gt = \lt x,y \gt \\ &\iff&U^*Uy=y\\ &\iff&U^*U=1\\ &\iff&U^*=U^{-1} \end{eqnarray*} [[/math]]

(3) As in the real case, [math]P[/math] is an abstract projection, not necessarily orthogonal, when [math]P^2=P[/math]. The point now is that this projection is orthogonal when:

[[math]] \begin{eqnarray*} \lt Px-Py,Px-x \gt =0 &\iff& \lt x-y,P^*Px-P^*x \gt =0\\ &\iff&P^*Px-P^*x=0\\ &\iff&P^*P-P^*=0 \end{eqnarray*} [[/math]]

Thus we must have [math]P^*=P^*P[/math]. Now observe that by conjugating, we obtain:

[[math]] \begin{eqnarray*} P &=&(P^*P)^*\\ &=&P^*(P^*)^*\\ &=&P^*P \end{eqnarray*} [[/math]]

Now by comparing with the original relation, [math]P^*=P^*P[/math], we conclude that [math]P=P^*[/math]. Thus, we have shown that any orthogonal projection must satisfy, as claimed:

[[math]] P^2=P=P^* [[/math]]

Conversely, if this condition is satisfied, [math]P^2=P[/math] shows that [math]P[/math] is a projection, and [math]P=P^*[/math] shows via the above computation that [math]P[/math] is indeed orthogonal.

(4) Once again in analogy with the real case, we have the following formula:

[[math]] \begin{eqnarray*} P_xy &=&\frac{ \lt y,x \gt }{ \lt x,x \gt }\,x\\ &=&\frac{1}{||x||^2} \lt y,x \gt x \end{eqnarray*} [[/math]]

With this in hand, we can now compute the entries of [math]P_x[/math], as follows:

[[math]] \begin{eqnarray*} (P_x)_{ij} &=& \lt P_xe_j,e_i \gt \\ &=&\frac{1}{||x||^2} \lt e_j,x \gt \lt x,e_i \gt \\ &=&\frac{\bar{x}_jx_i}{||x||^2} \end{eqnarray*} [[/math]]

Thus, we are led to the formula in the statement.

■

3c. Diagonalization

In the present complex matrix setting, we can talk as well about eigenvalues and eigenvectors, exactly as in the real case, as follows:

Definition

Let [math]A\in M_N(\mathbb C)[/math] be a square matrix. When [math]Av=\lambda v[/math] we say that:

[math]v[/math] is an eigenvector of [math]A[/math].
[math]\lambda[/math] is an eigenvalue of [math]A[/math].

We say that [math]A[/math] is diagonalizable when [math]\mathbb C^N[/math] has a basis of eigenvectors of [math]A[/math].

When [math]A[/math] is diagonalizable, in that basis of eigenvectors we can write:

[[math]] A=\begin{pmatrix} \lambda_1\\ &\ddots\\ &&\lambda_N \end{pmatrix} [[/math]]

In general, this means that we have a formula as follows, with [math]D[/math] diagonal:

[[math]] A=PDP^{-1} [[/math]]

Indeed, we can take [math]P[/math] to be the matrix formed by the eigenvectors:

[[math]] P=[v_1\ldots v_N] [[/math]]

As a first interesting result now, regarding the real matrices, we have:

Theorem

The eigenvalues of a real matrix [math]A\in M_N(\mathbb R)[/math] are the roots of the characteristic polynomial, given by:

[[math]] P(x)=\det(A-x1_N) [[/math]]

In particular, any such matrix [math]A\in M_N(\mathbb R)[/math] has at least [math]1[/math] complex eigenvalue.

Show Proof

The first assertion is something that we already know from chapter 2 above, which follows from the following computation:

[[math]] \begin{eqnarray*} \exists v,Av=\lambda v &\iff&\exists v,(A-\lambda 1_N)v=0\\ &\iff&\det(A-\lambda 1_N)=0 \end{eqnarray*} [[/math]]

As for the second assertion, this follows from the first assertion, and from Theorem 3.13, which shows in particular that [math]P[/math] has at least [math]1[/math] complex root.

■

It is possible to further build on these results, but this is quite long, and we will rather do this in the next chapter. For the moment, let us just keep in mind the conclusion that a real matrix [math]A\in M_N(\mathbb R)[/math] has substantially more chances of being diagonalizable over the complex numbers, than over the real numbers. As an illustration for this principle, and as a first concrete result, which is of true complex nature, we have:

Theorem

The rotation of angle [math]t\in\mathbb R[/math] in the real plane, namely

[[math]] R_t=\begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix} [[/math]]

can be diagonalized over the complex numbers, as follows:

[[math]] R_t=\frac{1}{2}\begin{pmatrix}1&1\\i&-i\end{pmatrix} \begin{pmatrix}e^{-it}&0\\0&e^{it}\end{pmatrix} \begin{pmatrix}1&-i\\1&i\end{pmatrix} [[/math]]

Over the real numbers this is impossible, unless [math]t=0,\pi[/math].

Show Proof

The last assertion is something clear, that we already know, coming from the fact that at [math]t\neq0,\pi[/math] our rotation is a “true” rotation, having no eigenvectors in the plane. Regarding the first assertion, the point is that we have the following computation:

[[math]] \begin{eqnarray*} R_t\binom{1}{i} &=&\begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix}\binom{1}{i}\\ &=&\binom{\cos t-i\sin t}{i\cos t+\sin t}\\ &=&e^{-it}\binom{1}{i} \end{eqnarray*} [[/math]]

We have as well a second eigenvector, as follows:

[[math]] \begin{eqnarray*} R_t\binom{1}{-i} &=&\begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix}\binom{1}{-i}\\ &=&\binom{\cos t+i\sin t}{-i\cos t+\sin t}\\ &=&e^{it}\binom{1}{-i} \end{eqnarray*} [[/math]]

Thus our matrix [math]R_t[/math] is diagonalizable over [math]\mathbb C[/math], with the diagonal form being:

[[math]] R_t\sim\begin{pmatrix}e^{-it}&0\\0&e^{it}\end{pmatrix} [[/math]]

As for the passage matrix, obtained by putting together the eigenvectors, this is:

[[math]] P=\begin{pmatrix}1&1\\i&-i\end{pmatrix} [[/math]]

In order to invert now [math]P[/math], we can use the standard inversion formula for the [math]2\times2[/math] matrices, which is the same as the one in the real case, and which gives:

[[math]] P^{-1} =\frac{1}{-2i}\begin{pmatrix}-i&-1\\-i&1\end{pmatrix} =\frac{1}{2}\begin{pmatrix}1&-i\\1&i\end{pmatrix} [[/math]]

Our diagonalization formula is therefore as follows:

[[math]] R_t=\frac{1}{2}\begin{pmatrix}1&1\\i&-i\end{pmatrix} \begin{pmatrix}e^{-it}&0\\0&e^{it}\end{pmatrix} \begin{pmatrix}1&-i\\1&i\end{pmatrix} [[/math]]

Thus, we are led to the conclusion in the statement.

■

3d. The determinant

Regarding now the determinant, for the complex matrices it is more convenient to follow an abstract approach, and this due to our lack of geometric intuition with the space [math]\mathbb C^N[/math], at [math]N\geq2[/math], and with the volumes of the bodies there. We have:

Definition

The determinant of a complex matrix [math]A\in M_N(\mathbb C)[/math] is given by

[[math]] \det A=\sum_{\sigma\in S_N}\varepsilon(\sigma)A_{1\sigma(1)}\ldots A_{N\sigma(N)} [[/math]]

with [math]\varepsilon=\pm1[/math] being the signature of the permutations.

Generally speaking, the theory of the determinant from the real case extends well. To be more precise, we first have the following result, summarizing the needed properties and formulae of the determinant of the complex matrices:

Theorem

The determinant has the following properties:

When adding two columns, the determinants get added:
[[math]] \det(\ldots,u+v,\ldots) =\det(\ldots,u,\ldots) +\det(\ldots,v,\ldots) [[/math]]
When multiplying columns by scalars, the determinant gets multiplied:
[[math]] \det(\lambda v_1,\ldots,\lambda_Nv_N)=\lambda_1\ldots\lambda_N\det(v_1,\ldots,v_N) [[/math]]
When permuting two columns, the determinant changes the sign:
[[math]] \det(\ldots,v,\ldots,w,\ldots)=-\det(\ldots,w,\ldots,v,\ldots) [[/math]]

Show Proof

This follows indeed by doing some elementary algebraic computations with permutations, which are similar to those in the real case, but now done backwards, based on the formula of the determinant from Definition 3.23.

■

We have as well a similar result for the rows, which completes the list of needed operations which are usually needed in practice, as follows:

Theorem

The determinant has the following properties:

When adding two rows, the determinants get added:
[[math]] \det\begin{pmatrix}\vdots\\ u+v\\ \vdots\end{pmatrix} =\det\begin{pmatrix}\vdots\\ u\\ \vdots\end{pmatrix} +\det\begin{pmatrix}\vdots\\ v\\ \vdots\end{pmatrix} [[/math]]
When multiplying rows by scalars, the determinant gets multiplied:
[[math]] \det\begin{pmatrix}\lambda_1v_1\\ \vdots\\ \lambda_Nv_N\end{pmatrix} =\lambda_1\ldots\lambda_N\det\begin{pmatrix}v_1\\ \vdots\\ v_N\end{pmatrix} [[/math]]
When permuting two rows, the determinant changes the sign.

Show Proof

This follows once again by doing some algebraic computations with permutations, based on the formula of the determinant from Definition 3.23.

■

Next in line, we have the following result, which is very useful in practice:

Theorem

The determinant is subject to the row expansion formula

[[math]] \begin{eqnarray*} \begin{vmatrix}a_{11}&\ldots&a_{1N}\\ \vdots&&\vdots\\ a_{N1}&\ldots&a_{NN}\end{vmatrix} &=&a_{11}\begin{vmatrix}a_{22}&\ldots&a_{2N}\\ \vdots&&\vdots\\ a_{N2}&\ldots&a_{NN}\end{vmatrix}\\ &-&a_{12}\begin{vmatrix}a_{21}&a_{23}&\ldots&a_{2N}\\ \vdots&\vdots&&\vdots\\ a_{N1}&a_{N3}&\ldots&a_{NN}\end{vmatrix}\\ &\vdots&\\ &\vdots&\\ &+&(-1)^{N+1}a_{1N}\begin{vmatrix}a_{21}&\ldots&a_{2,N-1}\\ \vdots&&\vdots\\ a_{N1}&\ldots&a_{N,N-1}\end{vmatrix} \end{eqnarray*} [[/math]]

and this method fully computes it, by recurrence.

Show Proof

This follows indeed by doing some elementary algebraic computations.

■

We can expand as well over the columns, as follows:

Theorem

The determinant is subject to the column expansion formula

[[math]] \begin{eqnarray*} \begin{vmatrix}a_{11}&\ldots&a_{1N}\\ \vdots&&\vdots\\ a_{N1}&\ldots&a_{NN}\end{vmatrix} &=&a_{11}\begin{vmatrix}a_{22}&\ldots&a_{2N}\\ \vdots&&\vdots\\ a_{N2}&\ldots&a_{NN}\end{vmatrix}\\ &-&a_{21}\begin{vmatrix}a_{12}&\ldots&a_{1N}\\ a_{32}&\ldots&a_{3N}\\ \vdots&&\vdots\\ a_{N2}&\ldots&a_{NN}\end{vmatrix}\\ &\vdots&\\ &\vdots&\\ &+&(-1)^{N+1}a_{N1}\begin{vmatrix}a_{12}&\ldots&a_{1N}\\ \vdots&&\vdots\\ a_{N-1,2}&\ldots&a_{N-1,N}\end{vmatrix} \end{eqnarray*} [[/math]]

and this method fully computes it, by recurrence.

Show Proof

Once again, this follows by doing some algebraic computations.

■

Still in analogy with the real case, we have the following result:

Theorem

The determinant of the systems of vectors

[[math]] \det:\mathbb C^N\times\ldots\times\mathbb C^N\to\mathbb C [[/math]]

is multilinear, alternate and unital, and unique with these properties.

Show Proof

This is something that we know in the real case, and the proof in the complex case is similar, with the conditions in the statement corresponding to those in Theorem 3.24. It is possible to prove this result as well directly, by doing some abstract algebra.

■

Finally, once again at the general level, let us record the following result:

Theorem

We have the following formulae,

[[math]] \det\bar{A}=\overline{\det A} [[/math]]

[[math]] \det A^t=\det A [[/math]]

[[math]] \det A^*=\overline{\det A} [[/math]]

valid for any square matrix [math]A\in M_N(\mathbb C)[/math].

Show Proof

The first formula is clear from Definition 3.23, because when conjugating the entries of [math]A[/math], the determinant will get conjugated:

[[math]] \det\bar{A}=\sum_{\sigma\in S_N}\varepsilon(\sigma)\overline{A_{1\sigma(1)}\ldots A_{N\sigma(N)}} [[/math]]

The second formula follows as in the real case, as follows:

[[math]] \begin{eqnarray*} \det A^t &=&\sum_{\sigma\in S_N}\varepsilon(\sigma)(A^t)_{1\sigma(1)}\ldots(A^t)_{N\sigma(N)}\\ &=&\sum_{\sigma\in S_N}\varepsilon(\sigma)A_{\sigma(1)1}\ldots A_{\sigma(N)N}\\ &=&\sum_{\sigma\in S_N}\varepsilon(\sigma)A_{1\sigma^{-1}(1)}\ldots A_{N\sigma^{-1}(N)}\\ &=&\sum_{\sigma\in S_N}\varepsilon(\sigma^{-1})A_{1\sigma^{-1}(1)}\ldots A_{N\sigma^{-1}(N)}\\ &=&\sum_{\sigma\in S_N}\varepsilon(\sigma)A_{1\sigma(1)}\ldots A_{N\sigma(N)}\\ &=&\det A \end{eqnarray*} [[/math]]

As for the third formula, this follows from the first two formulae, by using:

[[math]] \det A^*=\det \bar{A}^t [[/math]]

Thus, we are led to the conclusions in the statement.

■

Summarizing, the theory from the real case extends well, and we have complex analogues of all results. As in the real case, as a main application of all this, we have:

Theorem

The inverse of a square matrix, having nonzero determinant,

[[math]] A=\begin{pmatrix}a_{11}&\ldots&a_{1N}\\ \vdots&&\vdots\\ a_{N1}&\ldots&a_{NN}\end{pmatrix} [[/math]]

is given by the following formula,

[[math]] A^{-1}=\frac{1}{\det A} \begin{pmatrix} \det A^{(11)}&-\det A^{(21)}&\det A^{(31)}&\ldots\\ -\det A^{(12)}&\det A^{(22)}&-\det A^{(32)}&\ldots\\ \det A^{(13)}&-\det A^{(23)}&\det A^{(33)}&\ldots\\ \vdots&\vdots&\vdots& \end{pmatrix} [[/math]]

where [math]A^{(ij)}[/math] is the matrix [math]A[/math], with the [math]i[/math]-th row and [math]j[/math]-th column removed.

Show Proof

This follows indeed by using the row expansion formula from Theorem 3.26, which in terms of the matrix [math]A^{-1}[/math] in the statement reads [math]AA^{-1}=1[/math].

■

As a final topic now, regarding the complex matrices, let us discuss some interesting examples of such matrices, which definitely do not exist in the real setting, and which are very useful, even in connection with real matrix questions. Let us start with:

Definition

The Fourier matrix is as follows,

[[math]] F_N=(w^{ij})_{ij} [[/math]]

with [math]w=e^{2\pi i/N}[/math], and with the convention that the indices are

[[math]] i,j\in\{0,1,\ldots,N-1\} [[/math]]

and are usually taken modulo [math]N[/math].

Here the conventions regarding the indices are standard, and are there for various reasons, as for instance for having the first row and column consisting of 1 entries. Indeed, in standard matrix form, and with the above conventions for the indices, we have:

[[math]] F_N= \begin{pmatrix} 1&1&1&\ldots&1\\ 1&w&w^2&\ldots&w^{N-1}\\ 1&w^2&w^4&\ldots&w^{2(N-1)}\\ \vdots&\vdots&\vdots&&\vdots\\ 1&w^{N-1}&w^{2(N-1)}&\ldots&w^{(N-1)^2} \end{pmatrix} [[/math]]

Thus, what we have here is a Vandermonde matrix, in the sense of chapter 2, of very special type. Let us record as well the first few values of these matrices:

Proposition

The second Fourier matrix is as follows:

[[math]] F_2=\begin{pmatrix}1&1\\1&-1\end{pmatrix} [[/math]]

The third Fourier matrix is as follows, with [math]w=e^{2\pi i/3}[/math]:

[[math]] F_3=\begin{pmatrix}1&1&1\\1&w&w^2\\1&w^2&w\end{pmatrix} [[/math]]

As for the fourth Fourier matrix, this is as follows:

[[math]] F_4=\begin{pmatrix}1&1&1&1\\1&i&-1&-i\\1&-1&1&-1\\1&-i&-1&i\end{pmatrix} [[/math]]

Show Proof

All these formulae are clear from definitions, with our usual convention for the indices of the Fourier matrices, from Definition 3.31.

■

Our claim now is that the Fourier matrix can be used in order to solve a variety of linear algebra questions, a bit in a same way as the Fourier transform can be used in order to solve analysis questions. Before discussing all this, however, let us analyze the Fourier matrix [math]F_N[/math], from a linear algebra perspective. We have the following result:

Theorem

The Fourier matrix [math]F_N[/math] has the following properties:

It is symmetric, [math]F_N^t=F_N[/math].
The matrix [math]F_N/\sqrt{N}[/math] is unitary.
Its inverse is the matrix [math]F_N^*/N[/math].

Show Proof

This is a collection of elementary results, the idea being as follows:

(1) This is clear from definitions.

(2) The row vectors [math]R_0,\ldots,R_{N-1}[/math] of the rescaled matrix [math]F_N/\sqrt{N}[/math] have all length 1, and by using the barycenter formula in Theorem 3.16, we have, for any [math]i\neq j[/math]:

[[math]] \begin{eqnarray*} \lt R_i,R_j \gt &=&\frac{1}{N}\sum_kw^{ik}w^{-jk}\\ &=&\frac{1}{N}\sum_k(w^{i-j})^k\\ &=&0 \end{eqnarray*} [[/math]]

Thus, [math]R_0,\ldots,R_{N-1}[/math] are pairwise orthogonal, and so [math]F_N/\sqrt{N}[/math] is unitary, as claimed.

(3) This follows from (1) and (2), because for a symmetric matrix, the adjoint is the conjugate, and in the unitary case, this is the inverse.

■

Now back to our motivations, we were saying before that the Fourier matrix is to linear algebra what the Fourier transform is to analysis, namely advanced technology. In order to discuss now an illustrating application of the theory developed above, let us go back to our favorite example of a [math]N\times N[/math] matrix, namely the flat matrix:

[[math]] \mathbb I_N=\begin{pmatrix} 1&\ldots&1\\ \vdots&&\vdots\\ 1&\ldots&1\end{pmatrix} [[/math]]

This is a real matrix, and we know that we have [math]\mathbb I_N=NP_N[/math], with [math]P_N[/math] being the projection on the all-1 vector [math]\xi=(1)_i\in\mathbb R^N[/math]. Thus, [math]\mathbb I_N[/math] diagonalizes over [math]\mathbb R[/math]:

[[math]] \mathbb I_N\sim \begin{pmatrix} N\\ &0\\ &&\ddots\\ &&&0 \end{pmatrix} [[/math]]

The problem, however, is that when looking for 0-eigenvectors, in order to have an explicit diagonalization formula, we must solve the following equation:

[[math]] x_1+\ldots+x_N=0 [[/math]]

And this is not an easy task, if our objective is that of finding a nice and explicit basis for the space of solutions. To be more precise, if we want linearly independent vectors [math]v_1,\ldots,v_{N-1}\in\mathbb R^N[/math], each with components summing up to 0, and which are given by simple formulae, of type [math](v_i)_j=[/math] explicit function of [math]i,j[/math], we are in trouble.

Fortunately, complex numbers and Fourier analysis are there, and we have:

Theorem

The flat matrix, namely

[[math]] \mathbb I_N=\begin{pmatrix} 1&\ldots&1\\ \vdots&&\vdots\\ 1&\ldots&1\end{pmatrix} [[/math]]

has the following explicit diagonalization, over the complex numbers,

[[math]] \mathbb I_N=\frac{1}{N}F_NQF_N^* [[/math]]

with [math]F_N=(w^{ij})_{ij}[/math] being the Fourier matrix, and with [math]Q=diag(N,0,\ldots,0)[/math].

Show Proof

Indeed, the 0-eigenvector problem discussed above can be solved explicitely over the complex numbers, by using the formula in Theorem 3.16, with the solution [math](v_i)_j=w^{ij}[/math], with [math]w=e^{2\pi i/N}[/math]. Thus, we are led to the conclusion in the statement.

■

There are many other uses of the Fourier matrix [math]F_N[/math], along the same lines. We will be back to all this in chapter 7 below, with a complete discussion of the Fourier matrices, and of their generalizations, called complex Hadamard matrices.

General references

Banica, Teo (2024). "Linear algebra and group theory". arXiv:2206.09283 [math.CO].

References

^1.0 ^1.1 R.P. Feynman, R.B. Leighton and M. Sands, The Feynman lectures on physics I: mainly mechanics, radiation and heat, Caltech (1963).
^2.0 ^2.1 R.P. Feynman, R.B. Leighton and M. Sands, The Feynman lectures on physics II: mainly electromagnetism and matter, Caltech (1964).
^3.0 ^3.1 R.P. Feynman, R.B. Leighton and M. Sands, The Feynman lectures on physics III: quantum mechanics, Caltech (1966).
D.J. Griffiths, Introduction to electrodynamics, Cambridge Univ. Press (2017).
D.J. Griffiths and D.F. Schroeter, Introduction to quantum mechanics, Cambridge Univ. Press (2018).
D.J. Griffiths, Introduction to elementary particles, Wiley (2020).
D.J. Griffiths, Revolutions in twentieth-century physics, Cambridge Univ. Press (2012).
K. Huang, Fundamental forces of nature, World Scientific (2007).
M. Kumar, Quantum: Einstein, Bohr, and the great debate about the nature of reality, Norton (2009).

[fe1-1] 1.0 ^1.1 R.P. Feynman, R.B. Leighton and M. Sands, The Feynman lectures on physics I: mainly mechanics, radiation and heat, Caltech (1963).

[fe2-2] 2.0 ^2.1 R.P. Feynman, R.B. Leighton and M. Sands, The Feynman lectures on physics II: mainly electromagnetism and matter, Caltech (1964).

[fe3-3] 3.0 ^3.1 R.P. Feynman, R.B. Leighton and M. Sands, The Feynman lectures on physics III: quantum mechanics, Caltech (1966).

[gr1-4] D.J. Griffiths, Introduction to electrodynamics, Cambridge Univ. Press (2017).

[gr2-5] D.J. Griffiths and D.F. Schroeter, Introduction to quantum mechanics, Cambridge Univ. Press (2018).

[gr3-6] D.J. Griffiths, Introduction to elementary particles, Wiley (2020).

[gr4-7] D.J. Griffiths, Revolutions in twentieth-century physics, Cambridge Univ. Press (2012).

[hu2-8] K. Huang, Fundamental forces of nature, World Scientific (2007).

[kum-9] M. Kumar, Quantum: Einstein, Bohr, and the great debate about the nature of reality, Norton (2009).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]