Real matrices

[math] \newcommand{\mathds}{\mathbb}[/math]

1a. Linear maps

We are interested in what follows in symmetries, rotations, projections and other such basic transformations, in 2, 3 or even more dimensions. Such transformations appear a bit everywhere, in physics. To be more precise, each physical problem or equation has some “symmetries”, and exploiting these symmetries is usually a useful thing.


Let us start with 2 dimensions, and leave 3 and more dimensions for later. The transformations of the plane [math]\mathbb R^2[/math] that we are interested in are as follows:

Definition

A map [math]f:\mathbb R^2\to\mathbb R^2[/math] is called affine when it maps lines to lines,

[[math]] f(tx+(1-t)y)=tf(x)+(1-t)f(y) [[/math]]
for any [math]x,y\in\mathbb R^2[/math] and any [math]t\in\mathbb R[/math]. If in addition [math]f(0)=0[/math], we call [math]f[/math] linear.

As a first observation, our “maps lines to lines” interpretation of the equation in the statement assumes that the points are degenerate lines, and this in order for our interpretation to work when [math]x=y[/math], or when [math]f(x)=f(y)[/math]. Also, what we call line is not exactly a set, but rather a dynamic object, think trajectory of a point on that line. We will be back to this later, once we will know more about such maps.


Here are some basic examples of symmetries, all being linear in the above sense:

Proposition

The symmetries with respect to [math]Ox[/math] and [math]Oy[/math] are:

[[math]] \binom{x}{y}\to\binom{x}{-y}\quad,\quad \binom{x}{y}\to\binom{-x}{y} [[/math]]
The symmetries with respect to the [math]x=y[/math] and [math]x=-y[/math] diagonals are:

[[math]] \binom{x}{y}\to\binom{y}{x}\quad,\quad \binom{x}{y}\to\binom{-y}{-x} [[/math]]
All these maps are linear, in the above sense.


Show Proof

The fact that all these maps are linear is clear, because they map lines to lines, in our sense, and they also map [math]0[/math] to [math]0[/math]. As for the explicit formulae in the statement, these are clear as well, by drawing pictures for each of the maps involved.

Here are now some basic examples of rotations, once again all being linear:

Proposition

The rotations of angle [math]0^\circ[/math] and of angle [math]90^\circ[/math] are:

[[math]] \binom{x}{y}\to\binom{x}{y}\quad,\quad \binom{x}{y}\to\binom{-y}{x} [[/math]]
The rotations of angle [math]180^\circ[/math] and of angle [math]270^\circ[/math] are:

[[math]] \binom{x}{y}\to\binom{-x}{-y}\quad,\quad \binom{x}{y}\to\binom{y}{-x} [[/math]]
All these maps are linear, in the above sense.


Show Proof

As before, these rotations are all linear, for obvious reasons. As for the formulae in the statement, these are clear as well, by drawing pictures.

Here are some basic examples of projections, once again all being linear:

Proposition

The projections on [math]Ox[/math] and [math]Oy[/math] are:

[[math]] \binom{x}{y}\to\binom{x}{0}\quad,\quad \binom{x}{y}\to\binom{0}{y} [[/math]]
The projections on the [math]x=y[/math] and [math]x=-y[/math] diagonals are:

[[math]] \binom{x}{y}\to\frac{1}{2}\binom{x+y}{x+y}\quad,\quad \binom{x}{y}\to\frac{1}{2}\binom{x-y}{y-x} [[/math]]
All these maps are linear, in the above sense.


Show Proof

Again, these projections are all linear, and the formulae are clear as well, by drawing pictures, with only the last 2 formulae needing some explanations. In what regards the projection on the [math]x=y[/math] diagonal, the picture here is as follows: \vskip-6mm

[[math]] \xymatrix@R=15pt@C=15pt{ &\\ \circ\ar@{-}[d]\ar[u]\ar@{.}[rr]&&\bullet\ar@{.}[dd]\\ \circ\ar@{-}[d]\ar@{.}[rrr]&&&\bullet\ar[ul]\ar@{.}[d]\\ \circ\ar@{-}[uurr]\ar@{-}[rr]&\ &\circ\ar@{-}[r]&\circ\ar[rr]&&} [[/math]]

But this gives the result, since the [math]45^\circ[/math] triangle shows that this projection leaves invariant [math]x+y[/math], so we can only end up with the average [math](x+y)/2[/math], as double coordinate. As for the projection on the [math]x=-y[/math] diagonal, the proof here is similar.

Finally, we have the translations, which are as follows:

Proposition

The translations are exactly the maps of the form

[[math]] \binom{x}{y}\to\binom{x+p}{y+q} [[/math]]
with [math]p,q\in\mathbb R[/math], and these maps are all affine, in the above sense.


Show Proof

A translation [math]f:\mathbb R^2\to\mathbb R^2[/math] is clearly affine, because it maps lines to lines. Also, such a translation is uniquely determined by the following vector:

[[math]] f\binom{0}{0}=\binom{p}{q} [[/math]]

To be more precise, [math]f[/math] must be the map which takes a vector [math]\binom{x}{y}[/math], and adds this vector [math]\binom{p}{q}[/math] to it. But this gives the formula in the statement.

Summarizing, we have many interesting examples of linear and affine maps. Let us develop now some general theory, for such maps. As a first result, we have:

Theorem

For a map [math]f:\mathbb R^2\to\mathbb R^2[/math], the following are equivalent:

  • [math]f[/math] is linear in our sense, mapping lines to lines, and [math]0[/math] to [math]0[/math].
  • [math]f[/math] maps sums to sums, [math]f(x+y)=f(x)+f(y)[/math], and satisfies [math]f(\lambda x)=\lambda f(x)[/math].


Show Proof

This is something which comes from definitions, as follows:


[math](1)\implies(2)[/math] We know that [math]f[/math] satisfies the following equation, and [math]f(0)=0[/math]:

[[math]] f(tx+(1-t)y)=tf(x)+(1-t)f(y) [[/math]]

By setting [math]y=0[/math], and by using our assumption [math]f(0)=0[/math], we obtain, as desired:

[[math]] f(tx)=tf(x) [[/math]]

As for the first condition, regarding sums, this can be established as follows:

[[math]] \begin{eqnarray*} f(x+y) &=&f\left(2\cdot\frac{x+y}{2}\right)\\ &=&2f\left(\frac{x+y}{2}\right)\\ &=&2\cdot\frac{f(x)+f(y)}{2}\\ &=&f(x)+f(y) \end{eqnarray*} [[/math]]


[math](2)\implies(1)[/math] Conversely now, assuming that [math]f[/math] satisfies [math]f(x+y)=f(x)+f(y)[/math] and [math]f(\lambda x)=\lambda f(x)[/math], then [math]f[/math] must map lines to lines, as shown by:

[[math]] \begin{eqnarray*} f(tx+(1-t)y) &=&f(tx)+f((1-t)y)\\ &=&tf(x)+(1-t)f(y) \end{eqnarray*} [[/math]]

Also, we have [math]f(0)=f(2\cdot 0)=2f(0)[/math], which gives [math]f(0)=0[/math], as desired.

The above result is very useful, and in practice, we will often use the condition (2) there, somewhat as a new definition for the linear maps. Let us record this as follows:

Definition (upgrade)

A map [math]f:\mathbb R^2\to\mathbb R^2[/math] is called:

  • Linear, when it satisfies [math]f(x+y)=f(x)+f(y)[/math] and [math]f(\lambda x)=\lambda f(x)[/math].
  • Affine, when it is of the form [math]f=g+x[/math], with [math]g[/math] linear, and [math]x\in\mathbb R^2[/math].

Before getting into the mathematics of linear maps, let us comment a bit more on the “maps lines to lines” feature of such maps. As mentioned after Definition 1.1, this feature requires thinking at lines as being “dynamic” objects, the point being that, when thinking at lines as being sets, this interpretation fails, as shown by the following map:

[[math]] f\binom{x}{y}=\binom{x^3}{0} [[/math]]

However, in relation with all this we have the following useful result:

Theorem

For a continuous injective [math]f:\mathbb R^2\to\mathbb R^2[/math], the following are equivalent:

  • [math]f[/math] is affine in our sense, mapping lines to lines.
  • [math]f[/math] maps set-theoretical lines to set-theoretical lines.


Show Proof

By composing [math]f[/math] with a translation, we can assume that we have [math]f(0)=0[/math]. With this assumption made, the proof goes as follows:


[math](1)\implies(2)[/math] This is clear from definitions.


[math](2)\implies(1)[/math] Let us first prove that we have [math]f(x+y)=f(x)+f(y)[/math]. We do this first in the case where our vectors are not proportional, [math]x\not\sim y[/math]. In this case we have a proper parallelogram [math](0,x,y,x+y)[/math], and since [math]f[/math] was assumed to be injective, it must map parallel lines to parallel lines, and so must map our parallelogram into a parallelogram [math](0,f(x),f(y),f(x+y))[/math]. But this latter parallelogram shows that we have:

[[math]] f(x+y)=f(x)+f(y) [[/math]]

In the remaining case where our vectors are proportional, [math]x\sim y[/math], we can pick a sequence [math]x_n\to x[/math] satisfying [math]x_n\not\sim y[/math] for any [math]n[/math], and we obtain, as desired:

[[math]] \begin{eqnarray*} x_n\to x,x_n\not\sim y,\forall n &\implies&f(x_n+y)=f(x_n)+f(y),\forall n\\ &\implies&f(x+y)=f(x)+f(y) \end{eqnarray*} [[/math]]


Regarding now [math]f(\lambda x)=\lambda f(x)[/math], since [math]f[/math] maps lines to lines, it must map the line [math]0-x[/math] to the line [math]0-f(x)[/math], so we have a formula as follows, for any [math]\lambda,x[/math]:

[[math]] f(\lambda x)=\varphi_x(\lambda)f(x) [[/math]]

But since [math]f[/math] maps parallel lines to parallel lines, by Thales the function [math]\varphi_x:\mathbb R\to\mathbb R[/math] does not depend on [math]x[/math]. Thus, we have a formula as follows, for any [math]\lambda,x[/math]:

[[math]] f(\lambda x)=\varphi(\lambda)f(x) [[/math]]

We know that we have [math]\varphi(0)=0[/math] and [math]\varphi(1)=1[/math], and we must prove that we have [math]\varphi(\lambda)=\lambda[/math] for any [math]\lambda[/math]. For this purpose, we use a trick. On one hand, we have:

[[math]] f((\lambda+\mu)x)=\varphi(\lambda+\mu)f(x) [[/math]]

On the other hand, since [math]f[/math] maps sums to sums, we have as well:

[[math]] \begin{eqnarray*} f((\lambda+\mu)x) &=&f(\lambda x)+f(\mu x)\\ &=&\varphi(\lambda)f(x)+\varphi(\mu)f(x)\\ &=&(\varphi(\lambda)+\varphi(\mu))f(x) \end{eqnarray*} [[/math]]


Thus our rescaling function [math]\varphi:\mathbb R\to\mathbb R[/math] satisfies the following conditions:

[[math]] \varphi(0)=0\quad,\quad \varphi(1)=1\quad,\quad\varphi(\lambda+\mu)=\varphi(\lambda)+\varphi(\mu) [[/math]]

But with these conditions in hand, it is clear that we have [math]\varphi(\lambda)=\lambda[/math], first for all the inverses of integers, [math]\lambda=1/n[/math] with [math]n\in\mathbb N[/math], then for all rationals, [math]\lambda\in\mathbb Q[/math], and finally by continuity for all reals, [math]\lambda\in\mathbb R[/math]. Thus, we have proved the following formula:

[[math]] f(\lambda x)=\lambda f(x) [[/math]]

But this finishes the proof of [math](2)\implies(1)[/math], and we are done.

All this is very nice, and there are some further things that can be said, but getting to business, Definition 1.7 is what we need. Indeed, we have the following powerful result, stating that the linear/affine maps [math]f:\mathbb R^2\to\mathbb R^2[/math] are fully described by [math]4/6[/math] parameters:

Theorem

The linear maps [math]f:\mathbb R^2\to\mathbb R^2[/math] are precisely the maps of type

[[math]] f\binom{x}{y}=\binom{ax+by}{cx+dy} [[/math]]
and the affine maps [math]f:\mathbb R^2\to\mathbb R^2[/math] are precisely the maps of type

[[math]] f\binom{x}{y}=\binom{ax+by}{cx+dy}+\binom{p}{q} [[/math]]
with the conventions from Definition 1.7 for such maps.


Show Proof

Assuming that [math]f[/math] is linear in the sense of Definition 1.7, we have:

[[math]] \begin{eqnarray*} f\binom{x}{y} &=&f\left(\binom{x}{0}+\binom{0}{y}\right)\\ &=&f\binom{x}{0}+f\binom{0}{y}\\ &=&f\left(x\binom{1}{0}\right)+f\left(y\binom{0}{1}\right)\\ &=&xf\binom{1}{0}+yf\binom{0}{1} \end{eqnarray*} [[/math]]


Thus, we obtain the formula in the statement, with [math]a,b,c,d\in\mathbb R[/math] being given by:

[[math]] f\binom{1}{0}=\binom{a}{c}\quad,\quad f\binom{0}{1}=\binom{b}{d} [[/math]]

In the affine case now, we have as extra piece of data a vector, as follows:

[[math]] f\binom{0}{0}=\binom{p}{q} [[/math]]

Indeed, if [math]f:\mathbb R^2\to\mathbb R^2[/math] is affine, then the following map is linear:

[[math]] f-\binom{p}{q}:\mathbb R^2\to\mathbb R^2 [[/math]]

Thus, by using the formula in (1) we obtain the result.

Moving ahead now, Theorem 1.9 is all that we need for doing some non-trivial mathematics, and so in practice, that will be our “definition” for the linear and affine maps. In order to simplify now all that, which might be a bit complicated to memorize, the idea will be to put our parameters [math]a,b,c,d[/math] into a matrix, in the following way:

Definition

A matrix [math]A\in M_2(\mathbb R)[/math] is an array as follows:

[[math]] A=\begin{pmatrix}a&b\\ c&d\end{pmatrix} [[/math]]
These matrices act on the vectors in the following way,

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix}\binom{x}{y}=\binom{ax+by}{cx+dy} [[/math]]
the rule being “multiply the rows of the matrix by the vector”.

The above multiplication formula might seem a bit complicated, at a first glance, but it is not. Here is an example for it, quickly worked out:

[[math]] \begin{pmatrix}1&2\\ 5&6\end{pmatrix}\binom{3}{1}=\binom{1\cdot 3+2\cdot 1}{5\cdot 3+6\cdot 1}=\binom{5}{21} [[/math]]

As already mentioned, all this comes from our findings from Theorem 1.9. Indeed, with the above multiplication convention for matrices and vectors, we can turn Theorem 1.9 into something much simpler, and better-looking, as follows:

Theorem

The linear maps [math]f:\mathbb R^2\to\mathbb R^2[/math] are precisely the maps of type

[[math]] f(v)=Av [[/math]]
and the affine maps [math]f:\mathbb R^2\to\mathbb R^2[/math] are precisely the maps of type

[[math]] f(v)=Av+w [[/math]]
with [math]A[/math] being a [math]2\times 2[/math] matrix, and with [math]v,w\in\mathbb R^2[/math] being vectors, written vertically.


Show Proof

With the above conventions, the formulae in Theorem 1.9 read:

[[math]] f\binom{x}{y}=\begin{pmatrix}a&b\\ c&d\end{pmatrix}\binom{x}{y} [[/math]]

[[math]] f\binom{x}{y}=\begin{pmatrix}a&b\\ c&d\end{pmatrix}\binom{x}{y}+\binom{p}{q} [[/math]]

But these are exactly the formulae in the statement, with:

[[math]] A=\begin{pmatrix}a&b\\ c&d\end{pmatrix}\quad,\quad v=\binom{x}{y}\quad,\quad w=\binom{p}{q} [[/math]]

Thus, we have proved our theorem.

Before going further, let us discuss some examples. First, we have:

Proposition

The symmetries with respect to [math]Ox[/math] and [math]Oy[/math] are given by:

[[math]] \begin{pmatrix}1&0\\0&-1\end{pmatrix}\binom{x}{y}\quad,\quad \begin{pmatrix}-1&0\\0&1\end{pmatrix}\binom{x}{y} [[/math]]
The symmetries with respect to the [math]x=y[/math] and [math]x=-y[/math] diagonals are given by:

[[math]] \begin{pmatrix}0&1\\1&0\end{pmatrix}\binom{x}{y}\quad,\quad \begin{pmatrix}0&-1\\-1&0\end{pmatrix}\binom{x}{y} [[/math]]


Show Proof

According to Proposition 1.2, the above transformations map [math]\binom{x}{y}[/math] to:

[[math]] \binom{x}{-y} \quad,\quad\binom{-x}{y} \quad,\quad\binom{y}{x} \quad,\quad\binom{-y}{-x} [[/math]]

But this gives the formulae in the statement, by guessing in each case the matrix which does the job, in the obvious way.

Regarding now the basic rotations, we have here:

Proposition

The rotations of angle [math]0^\circ[/math] and of angle [math]90^\circ[/math] are given by:

[[math]] \begin{pmatrix}1&0\\0&1\end{pmatrix}\binom{x}{y}\quad,\quad \begin{pmatrix}0&-1\\1&0\end{pmatrix}\binom{x}{y} [[/math]]
The rotations of angle [math]180^\circ[/math] and of angle [math]270^\circ[/math] are given by:

[[math]] \begin{pmatrix}-1&0\\0&-1\end{pmatrix}\binom{x}{y}\quad,\quad \begin{pmatrix}0&1\\-1&0\end{pmatrix}\binom{x}{y} [[/math]]


Show Proof

As before, but by using Proposition 1.3, the vector [math]\binom{x}{y}[/math] maps to:

[[math]] \binom{x}{y} \quad,\quad\binom{-y}{x} \quad,\quad\binom{-x}{-y} \quad,\quad\binom{y}{-x} [[/math]]

But this gives the formulae in the statement, as before by guessing the matrix.

Finally, regarding the basic projections, we have here:

Proposition

The projections on [math]Ox[/math] and [math]Oy[/math] are given by:

[[math]] \begin{pmatrix}1&0\\0&0\end{pmatrix}\binom{x}{y}\quad,\quad \begin{pmatrix}0&0\\0&1\end{pmatrix}\binom{x}{y} [[/math]]
The projections on the [math]x=y[/math] and [math]x=-y[/math] diagonals are given by:

[[math]] \frac{1}{2}\begin{pmatrix}1&1\\1&1\end{pmatrix}\binom{x}{y}\quad,\quad \frac{1}{2}\begin{pmatrix}1&-1\\-1&1\end{pmatrix}\binom{x}{y} [[/math]]


Show Proof

As before, but according now to Proposition 1.4, the vector [math]\binom{x}{y}[/math] maps to:

[[math]] \binom{x}{0} \quad,\quad\binom{0}{y} \quad,\quad\frac{1}{2}\binom{x+y}{x+y} \quad,\quad\frac{1}{2}\binom{x-y}{y-x} [[/math]]

But this gives the formulae in the statement, as before by guessing the matrix.

In addition to the above transformations, there are many other examples. We have for instance the null transformation, which is given by:

[[math]] \begin{pmatrix}0&0\\0&0\end{pmatrix}\binom{x}{y}=\binom{0}{0} [[/math]]

Here is now a more bizarre map, which can still be understood, however, as being the map which “switches the coordinates, then kills the second one”:

[[math]] \begin{pmatrix}0&1\\0&0\end{pmatrix}\binom{x}{y}=\binom{y}{0} [[/math]]

Even more bizarrely now, here is a certain linear map, whose interpretation is more complicated, and is left to you, reader:

[[math]] \begin{pmatrix}1&1\\0&0\end{pmatrix}\binom{x}{y}=\binom{x+y}{0} [[/math]]

And here is another linear map, which once again, being something geometric, in 2 dimensions, can definitely be understood, at least in theory:

[[math]] \begin{pmatrix}1&1\\0&1\end{pmatrix}\binom{x}{y}=\binom{x+y}{y} [[/math]]

Let us discuss now the computation of the arbitrary symmetries, rotations and projections. We begin with the rotations, whose formula is a must-know:

Theorem

The rotation of angle [math]t\in\mathbb R[/math] is given by the matrix

[[math]] R_t=\begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix} [[/math]]
depending on [math]t\in\mathbb R[/math] taken modulo [math]2\pi[/math].


Show Proof

The rotation being linear, it must correspond to a certain matrix:

[[math]] R_t=\begin{pmatrix}a&b\\ c&d\end{pmatrix} [[/math]]

We can guess this matrix, via its action on the basic coordinate vectors [math]\binom{1}{0}[/math] and [math]\binom{0}{1}[/math]. A quick picture shows that we must have:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix}\begin{pmatrix}1\\ 0\end{pmatrix}= \begin{pmatrix}\cos t\\ \sin t\end{pmatrix} [[/math]]

Also, by paying attention to positives and negatives, we must have:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix}\begin{pmatrix}0\\ 1\end{pmatrix}= \begin{pmatrix}-\sin t\\ \cos t\end{pmatrix} [[/math]]

Guessing now the matrix is not complicated, because the first equation gives us the first column, and the second equation gives us the second column:

[[math]] \binom{a}{c}=\begin{pmatrix}\cos t\\ \sin t\end{pmatrix}\quad,\quad \binom{b}{d}=\begin{pmatrix}-\sin t\\ \cos t\end{pmatrix} [[/math]]

Thus, we can just put together these two vectors, and we obtain our matrix.

Regarding now the symmetries, the formula here is as follows:

Theorem

The symmetry with respect to the [math]Ox[/math] axis rotated by an angle [math]t/2\in\mathbb R[/math] is given by the matrix

[[math]] S_t=\begin{pmatrix}\cos t&\sin t\\ \sin t&-\cos t\end{pmatrix} [[/math]]
depending on [math]t\in\mathbb R[/math] taken modulo [math]2\pi[/math].


Show Proof

As before, we can guess the matrix via its action on the basic coordinate vectors [math]\binom{1}{0}[/math] and [math]\binom{0}{1}[/math]. A quick picture shows that we must have:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix}\begin{pmatrix}1\\ 0\end{pmatrix}= \begin{pmatrix}\cos t\\ \sin t\end{pmatrix} [[/math]]

Also, by paying attention to positives and negatives, we must have:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix}\begin{pmatrix}0\\ 1\end{pmatrix}= \begin{pmatrix}\sin t\\-\cos t\end{pmatrix} [[/math]]

Guessing now the matrix is not complicated, because we must have:

[[math]] \binom{a}{c}=\begin{pmatrix}\cos t\\ \sin t\end{pmatrix}\quad,\quad \binom{b}{d}=\begin{pmatrix}\sin t\\-\cos t\end{pmatrix} [[/math]]

Thus, we can just put together these two vectors, and we obtain our matrix.

Finally, regarding the projections, the formula here is as follows:

Theorem

The projection on the [math]Ox[/math] axis rotated by an angle [math]t/2\in\mathbb R[/math] is given by the matrix

[[math]] P_t=\frac{1}{2}\begin{pmatrix}1+\cos t&\sin t\\ \sin t&1-\cos t\end{pmatrix} [[/math]]
depending on [math]t\in\mathbb R[/math] taken modulo [math]2\pi[/math].


Show Proof

We will need here some trigonometry, and more precisely the formulae for the duplication of the angles. Regarding the sine, the formula here is:

[[math]] \sin(2t)=2\sin t\cos t [[/math]]

Regarding the cosine, we have here 3 equivalent formulae, as follows:

[[math]] \begin{eqnarray*} \cos(2t) &=&\cos^2t-\sin^2t\\ &=&2\cos^2t-1\\ &=&1-2\sin^2t \end{eqnarray*} [[/math]]


Getting back now to our problem, some quick pictures, using similarity of triangles, and then the above trigonometry formulae, show that we must have:

[[math]] P_t\begin{pmatrix}1\\ 0\end{pmatrix} =\cos\frac{t}{2}\binom{\cos\frac{t}{2}}{\sin\frac{t}{2}} =\frac{1}{2}\begin{pmatrix}1+\cos t\\ \sin t\end{pmatrix} [[/math]]

[[math]] P_t\begin{pmatrix}0\\ 1\end{pmatrix} =\sin\frac{t}{2}\binom{\cos\frac{t}{2}}{\sin\frac{t}{2}} =\frac{1}{2}\begin{pmatrix}\sin t\\1-\cos t\end{pmatrix} [[/math]]

Now by putting together these two vectors, and we obtain our matrix.

1b. Matrix calculus

In order to formulate now our second theorem, dealing with compositions of maps, let us make the following multiplication convention, between matrices and matrices:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix} \begin{pmatrix}p&q\\ r&s\end{pmatrix} =\begin{pmatrix}ap+br&aq+bs\\ cp+dr&cq+ds\end{pmatrix} [[/math]]

This might look a bit complicated, but as before, in what was concerning multiplying matrices and vectors, the idea is very simple, namely “multiply the rows of the first matrix by the columns of the second matrix”. With this convention, we have:

Theorem

If we denote by [math]f_A:\mathbb R^2\to\mathbb R^2[/math] the linear map associated to a matrix [math]A[/math], given by the formula

[[math]] f_A(v)=Av [[/math]]
then we have the following multiplication formula for such maps:

[[math]] f_Af_B=f_{AB} [[/math]]
That is, the composition of linear maps corresponds to the multiplication of matrices.


Show Proof

We want to prove that we have the following formula, valid for any two matrices [math]A,B\in M_2(\mathbb R)[/math], and any vector [math]v\in\mathbb R^2[/math]:

[[math]] A(Bv)=(AB)v [[/math]]

For this purpose, let us write our matrices and vector as follows:

[[math]] A=\begin{pmatrix}a&b\\ c&d\end{pmatrix}\quad,\quad B=\begin{pmatrix}p&q\\ r&s\end{pmatrix}\quad,\quad v=\binom{x}{y} [[/math]]

The formula that we want to prove becomes:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix} \left[ \begin{pmatrix}p&q\\ r&s\end{pmatrix} \binom{x}{y} \right]= \left[\begin{pmatrix}a&b\\ c&d\end{pmatrix}\begin{pmatrix}p&q\\ r&s\end{pmatrix}\right] \binom{x}{y} [[/math]]

But this is the same as saying that:

[[math]] \begin{pmatrix}a&b\\ c&d\end{pmatrix} \binom{px+qy}{rx+sy}= \begin{pmatrix}ap+br&aq+bs\\ cp+dr&cq+ds\end{pmatrix} \binom{x}{y} [[/math]]

And this latter formula does hold indeed, because on both sides we get:

[[math]] \binom{apx+aqy+brx+bsy}{cpx+cqy+drx+dsy} [[/math]]

Thus, we have proved the result.

As a verification for the above result, let us compose two rotations. The computation here is as follows, yieding a rotation, as it should, and of the correct angle:

[[math]] \begin{eqnarray*} R_sR_t &=&\begin{pmatrix}\cos s&-\sin s\\ \sin s&\cos s\end{pmatrix}\begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix}\\ &=&\begin{pmatrix}\cos s\cos t-\sin s\sin t&&-\cos s\sin t-\sin t\cos s\\ \sin s\cos t+\cos s\sin t&&-\sin s\sin t+\cos s\cos t\end{pmatrix}\\ &=&\begin{pmatrix}\cos(s+t)&-\sin(s+t)\\ \sin(s+t)&\cos(s+t)\end{pmatrix}\\ &=&R_{s+t} \end{eqnarray*} [[/math]]


We are ready now to pass to 3 dimensions. The idea is to select what we learned in 2 dimensions, nice results only, and generalize to 3 dimensions. We obtain:

Theorem

Consider a map [math]f:\mathbb R^3\to\mathbb R^3[/math].

  • [math]f[/math] is linear when it is of the form [math]f(v)=Av[/math], with [math]A\in M_3(\mathbb R)[/math].
  • [math]f[/math] is affine when [math]f(v)=Av+w[/math], with [math]A\in M_3(\mathbb R)[/math] and [math]w\in\mathbb R^3[/math].
  • We have the composition formula [math]f_Af_B=f_{AB}[/math], similar to the [math]2D[/math] one.


Show Proof

Here (1,2) can be proved exactly as in the 2D case, with the multiplication convention being as usual, “multiply the rows of the matrix by the vector”:

[[math]] \begin{pmatrix}a&b&c\\ d&e&f\\ g&h&i\end{pmatrix} \begin{pmatrix}x\\ y\\ z\end{pmatrix} =\begin{pmatrix}ax+by+cz\\ dx+ey+fz\\ gx+hy+iz\end{pmatrix} [[/math]]

As for (3), once again the 2D idea applies, with the same product rule, “multiply the rows of the first matrix by the columns of the second matrix”:

[[math]] \begin{pmatrix}a&b&c\\ d&e&f\\ g&h&i\end{pmatrix} \begin{pmatrix}p&q&r\\ s&t&u\\ v&w&x\end{pmatrix}\\ =\begin{pmatrix} ap+bs+cv&aq+bt+cw&ar+bu+cx\\ dp+es+fv&dq+et+fw&dr+eu+fx\\ gp+hs+iv&gq+ht+iw&gr+hu+ix \end{pmatrix} [[/math]]

Thus, we have proved our theorem. Of course, we are going a bit fast here, and some verifications are missing, but we will discuss all this in detail, in [math]N[/math] dimensions.

We are now ready to discuss 4 and more dimensions. Before doing so, let us point out however that the maps of type [math]f:\mathbb R^3\to\mathbb R^2[/math], or [math]f:\mathbb R\to\mathbb R^2[/math], and so on, are not covered by our results. Since there are many interesting such maps, say obtained by projecting and then rotating, and so on, we will be interested here in the maps [math]f:\mathbb R^N\to\mathbb R^M[/math].


A bit of thinking suggests that such maps should come from the [math]M\times N[/math] matrices. Indeed, this is what happens at [math]M=N=2[/math] and [math]M=N=3[/math], of course. But this happens as well at [math]N=1[/math], because a linear map [math]f:\mathbb R\to\mathbb R^M[/math] can only be something of the form [math]f(\lambda)=\lambda v[/math], with [math]v\in\mathbb R^M[/math]. But [math]v\in\mathbb R^M[/math] means that [math]v[/math] is a [math]M\times 1[/math] matrix. So, let us start with the product rule for such matrices, which is as follows:

Definition

We can multiply the [math]M\times N[/math] matrices with [math]N\times K[/math] matrices,

[[math]] \begin{pmatrix} a_{11}&\ldots&a_{1N}\\ \vdots&&\vdots\\ a_{M1}&\ldots&a_{MN} \end{pmatrix} \begin{pmatrix} b_{11}&\ldots&b_{1K}\\ \vdots&&\vdots\\ b_{N1}&\ldots&b_{NK} \end{pmatrix} [[/math]]
the product being the [math]M\times K[/math] matrix given by the following formula,

[[math]] \begin{pmatrix} a_{11}b_{11}+\ldots+a_{1N}b_{N1}&\ldots\ldots&a_{11}b_{1K}+\ldots+a_{1N}b_{NK}\\ \vdots&&\vdots\\ \vdots&&\vdots\\ a_{M1}b_{11}+\ldots+a_{MN}b_{N1}&\ldots\ldots&a_{M1}b_{1K}+\ldots+a_{MN}b_{NK} \end{pmatrix} [[/math]]
obtained via the usual rule “multiply rows by columns”.

Observe that this formula generalizes all the multiplication rules that we have been using so far, between various types of matrices and vectors. Thus, in practice, we can simply forget all the previous multiplication rules, and simply memorize this one.


In case the above formula looks hard to memorize, here is an alternative formulation of it, which is simpler and more powerful, by using the standard algebraic notation for the matrices, [math]A=(A_{ij})[/math], that we will heavily use, in what follows:

Proposition

The matrix multiplication is given by formula

[[math]] (AB)_{ij}=\sum_kA_{ik}B_{kj} [[/math]]
with [math]A_{ij}[/math] standing for the entry of [math]A[/math] at row [math]i[/math] and column [math]j[/math].


Show Proof

This is indeed just a shorthand for the formula in Definition 1.20, by following the rule there, namely “multiply the rows of [math]A[/math] by the columns of [math]B[/math]”.

As an illustration for the power of the convention in Proposition 1.21, we have:

Proposition

We have the following formula, valid for any matrices [math]A,B,C[/math],

[[math]] (AB)C=A(BC) [[/math]]
provided that the sizes of our matrices [math]A,B,C[/math] fit.


Show Proof

We have the following computation, using indices as above:

[[math]] ((AB)C)_{ij} =\sum_k(AB)_{ik}C_{kj} =\sum_{kl}A_{il}B_{lk}C_{kj} [[/math]]

On the other hand, we have as well the following computation:

[[math]] (A(BC))_{ij} =\sum_lA_{il}(BC)_{lj} =\sum_{kl}A_{il}B_{lk}C_{kj} [[/math]]

Thus we have [math](AB)C=A(BC)[/math], and we have proved our result.

We can now talk about linear maps between spaces of arbitrary dimension, generalizing what we have been doing so far. The main result here is as follows:

Theorem

Consider a map [math]f:\mathbb R^N\to\mathbb R^M[/math].

  • [math]f[/math] is linear when it is of the form [math]f(v)=Av[/math], with [math]A\in M_{M\times N}(\mathbb R)[/math].
  • [math]f[/math] is affine when [math]f(v)=Av+w[/math], with [math]A\in M_{M\times N}(\mathbb R)[/math] and [math]w\in\mathbb R^M[/math].
  • We have the composition formula [math]f_Af_B=f_{AB}[/math], whenever the sizes fit.


Show Proof

We already know that this happens at [math]M=N=2[/math], and at [math]M=N=3[/math] as well. In general, the proof is similar, by doing some elementary computations.

As a first example here, we have the identity matrix, acting as the identity:

[[math]] \begin{pmatrix} 1&&0\\ &\ddots\\ 0&&1\end{pmatrix} \begin{pmatrix}x_1\\ \vdots\\ x_N\end{pmatrix} =\begin{pmatrix}x_1\\ \vdots\\ x_N\end{pmatrix} [[/math]]

We have as well the null matrix, acting as the null map:

[[math]] \begin{pmatrix} 0&\ldots&0\\ \vdots&&\vdots\\ 0&\ldots&0\end{pmatrix} \begin{pmatrix}x_1\\ \vdots\\ x_N\end{pmatrix} =\begin{pmatrix}0\\ \vdots\\ 0\end{pmatrix} [[/math]]

Here is now an important result, providing us with many examples:

Proposition

The diagonal matrices act as follows:

[[math]] \begin{pmatrix} \lambda_1&&0\\ &\ddots\\ 0&&\lambda_N\end{pmatrix} \begin{pmatrix}x_1\\ \vdots\\ x_N\end{pmatrix} =\begin{pmatrix}\lambda_1x_1\\ \vdots\\ \lambda_Nx_N\end{pmatrix} [[/math]]


Show Proof

This is clear, indeed, from definitions.

As a more specialized example now, we have:

Proposition

The flat matrix, which is as follows,

[[math]] \mathbb I_N=\begin{pmatrix} 1&\ldots&1\\ \vdots&&\vdots\\ 1&\ldots&1\end{pmatrix} [[/math]]
acts via [math]N[/math] times the projection on the all-one vector.


Show Proof

The flat matrix acts in the following way:

[[math]] \begin{pmatrix} 1&\ldots&1\\ \vdots&&\vdots\\ 1&\ldots&1\end{pmatrix} \begin{pmatrix}x_1\\ \vdots\\ x_N\end{pmatrix} =\begin{pmatrix}x_1+\ldots+x_N\\ \vdots\\ x_1+\ldots+x_N\end{pmatrix} [[/math]]

Thus, in terms of the matrix [math]P=\mathbb I_N/N[/math], we have the following formula:

[[math]] P\begin{pmatrix}x_1\\ \vdots\\ x_N\end{pmatrix} =\frac{x_1+\ldots+x_N}{N}\begin{pmatrix}1\\ \vdots\\ 1\end{pmatrix} [[/math]]

Since the linear map [math]f(x)=Px[/math] satisfies [math]f^2=f[/math], and since [math]Im(f)[/math] consists of the scalar multiples of the all-one vector [math]\xi\in\mathbb R^N[/math], we conclude that [math]f[/math] is a projection on [math]\mathbb R\xi[/math]. Also, with the standard scalar product convention [math] \lt x,y \gt =\sum x_iy_i[/math], we have:

[[math]] \begin{eqnarray*} \lt f(x)-x,\xi \gt &=& \lt f(x),\xi \gt - \lt x,\xi \gt \\ &=&\frac{\sum x_i}{N}\times N-\sum x_i\\ &=&0 \end{eqnarray*} [[/math]]


Thus, our projection is an orthogonal projection, and we are done.

1c. Diagonalization

Let us develop now some general theory for the square matrices. We will need the following standard result, regarding the changes of coordinates in [math]\mathbb R^N[/math]:

Theorem

For a system [math]\{v_1,\ldots,v_N\}\subset\mathbb R^N[/math], the following are equivalent:

  • The vectors [math]v_i[/math] form a basis of [math]\mathbb R^N[/math], in the sense that each vector [math]x\in\mathbb R^N[/math] can be written in a unique way as a linear combination of these vectors:
    [[math]] x=\sum\lambda_iv_i [[/math]]
  • The following linear map associated to these vectors is bijective:
    [[math]] f:\mathbb R^N\to\mathbb R^N\quad,\quad \lambda\to\sum\lambda_iv_i [[/math]]
  • The matrix formed by these vectors, regarded as usual as column vectors,
    [[math]] P=[v_1,\ldots,v_N]\in M_N(\mathbb R) [[/math]]
    is invertible, with respect to the usual multiplication of the matrices.


Show Proof

Here the equivalence [math](1)\iff(2)[/math] is clear from definitions, and the equivalence [math](2)\iff(3)[/math] is clear as well, because we have [math]f(x)=Px[/math].

Getting back now to the matrices, as an important definition, we have:

Definition

Let [math]A\in M_N(\mathbb R)[/math] be a square matrix. We say that [math]v\in\mathbb R^N[/math] is an eigenvector of [math]A[/math], with corresponding eigenvalue [math]\lambda\in\mathbb R^N[/math], when:

[[math]] Av=\lambda v [[/math]]
Also, we say that [math]A[/math] is diagonalizable when [math]\mathbb R^N[/math] has a basis formed by eigenvectors of [math]A[/math].

We will see in a moment examples of eigenvectors and eigenvalues, and of diagonalizable matrices. However, even before seeing the examples, it is quite clear that these are key notions. Indeed, for a matrix [math]A\in M_N(\mathbb R)[/math], being diagonalizable is the best thing that can happen, because in this case, once the basis changed, [math]A[/math] becomes diagonal.


To be more precise here, we have the following result:

Proposition

Assuming that [math]A\in M_N(\mathbb R)[/math] is diagonalizable, we have the formula

[[math]] A=\begin{pmatrix} \lambda_1\\ &\ddots\\ &&\lambda_N \end{pmatrix} [[/math]]
with respect to the basis [math]\{v_1,\ldots,v_N\}[/math] of [math]\mathbb R^N[/math] consisting of eigenvectors of [math]A[/math].


Show Proof

This is clear from the definition of eigenvalues and eigenvectors, and from the formula of linear maps associated to diagonal matrices, from Proposition 1.24.

Here is an equivalent form of the above result, which is often used in practice, when we prefer not to change the basis, and stay with the usual basis of [math]\mathbb R^N[/math]:

Theorem

Assuming that [math]A\in M_N(\mathbb R)[/math] is diagonalizable, with

[[math]] v_1,\ldots,v_N\in\mathbb R^N\quad,\quad \lambda_1,\ldots,\lambda_N\in\mathbb R [[/math]]
as eigenvectors and corresponding eigenvalues, we have the formula

[[math]] A=PDP^{-1} [[/math]]
with the matrices [math]P,D\in M_N(\mathbb R)[/math] being given by the formulae

[[math]] P=[v_1,\ldots,v_N]\quad,\quad D=diag(\lambda_1,\ldots,\lambda_N) [[/math]]
and respectively called passage matrix, and diagonal form of [math]A[/math].


Show Proof

This can be viewed in two possible ways, as follows:


(1) As already mentioned, with respect to the basis [math]v_1,\ldots,v_N\in\mathbb R^N[/math] formed by the eigenvectors, our matrix [math]A[/math] is given by:

[[math]] A=\begin{pmatrix} \lambda_1\\ &\ddots\\ &&\lambda_N \end{pmatrix} [[/math]]

But this corresponds precisely to the formula [math]A=PDP^{-1}[/math] from the statement, with [math]P[/math] and its inverse appearing there due to our change of basis.


(2) We can equally establish the formula in the statement by a direct computation. Indeed, we have [math]Pe_i=v_i[/math], where [math]\{e_1,\ldots,e_N\}[/math] is the standard basis of [math]\mathbb R^N[/math], and so:

[[math]] APe_i =Av_i =\lambda_iv_i [[/math]]

On the other hand, once again by using [math]Pe_i=v_i[/math], we have as well:

[[math]] PDe_i =P\lambda_ie_i =\lambda_iPe_i =\lambda_iv_i [[/math]]

Thus we have [math]AP=PD[/math], and so [math]A=PDP^{-1}[/math], as claimed.

Let us discuss now some basic examples, namely the rotations, symmetries and projections in 2 dimensions. The situation is very simple for the projections, as follows:

Proposition

The projection on the [math]Ox[/math] axis rotated by an angle [math]t/2\in\mathbb R[/math],

[[math]] P_t=\frac{1}{2}\begin{pmatrix}1+\cos t&\sin t\\ \sin t&1-\cos t\end{pmatrix} [[/math]]
is diagonalizable, its diagonal form being as follows:

[[math]] P_t\sim\begin{pmatrix}1&0\\0&0\end{pmatrix} [[/math]]


Show Proof

This is clear, because if we denote by [math]L[/math] the line where our projection projects, we can pick any vector [math]v\in L[/math], and this will be an eigenvector with eigenvalue 1, and then pick any vector [math]w\in L^\perp[/math], and this will be an eigenvector with eigenvalue 0. Thus, even without computations, we are led to the conclusion in the statement.

The computation for the symmetries is similar, as follows:

Proposition

The symmetry with respect to the [math]Ox[/math] axis rotated by [math]t/2\in\mathbb R[/math],

[[math]] S_t=\begin{pmatrix}\cos t&\sin t\\ \sin t&-\cos t\end{pmatrix} [[/math]]
is diagonalizable, its diagonal form being as follows:

[[math]] S_t\sim\begin{pmatrix}1&0\\0&-1\end{pmatrix} [[/math]]


Show Proof

This is once again clear, because if we denote by [math]L[/math] the line with respect to which our symmetry symmetrizes, we can pick any vector [math]v\in L[/math], and this will be an eigenvector with eigenvalue 1, and then pick any vector [math]w\in L^\perp[/math], and this will be an eigenvector with eigenvalue [math]-1[/math]. Thus, we are led to the conclusion in the statement.

Regarding now the rotations, here the situation is different, as follows:

Proposition

The rotation of angle [math]t\in[0,2\pi)[/math], given by the formula

[[math]] R_t=\begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix} [[/math]]
is diagonal at [math]t=0,\pi[/math], and is not diagonalizable at [math]t\neq0,\pi[/math].


Show Proof

The first assertion is clear, because at [math]t=0,\pi[/math] the rotations are:

[[math]] R_0=\begin{pmatrix}1&0\\0&1\end{pmatrix}\quad,\quad R_\pi=\begin{pmatrix}-1&0\\0&-1\end{pmatrix} [[/math]]

As for the rotations of angle [math]t\neq0,\pi[/math], these clearly cannot have eigenvectors.

Finally, here is one more example, which is the most important of them all:

Theorem

The following matrix is not diagonalizable,

[[math]] J=\begin{pmatrix}0&1\\0&0\end{pmatrix} [[/math]]
because it has only [math]1[/math] eigenvector.


Show Proof

The above matrix, called [math]J[/math] en hommage to Jordan, acts as follows:

[[math]] \begin{pmatrix}0&1\\0&0\end{pmatrix}\binom{x}{y}=\binom{y}{0} [[/math]]

Thus the eigenvector/eigenvalue equation [math]Jv=\lambda v[/math] reads:

[[math]] \binom{y}{0}=\binom{\lambda x}{\lambda y} [[/math]]

We have then two cases, depending on [math]\lambda[/math], as follows, which give the result:


(1) For [math]\lambda\neq0[/math] we must have [math]y=0[/math], coming from the second row, and so [math]x=0[/math] as well, coming from the first row, so we have no nontrivial eigenvectors.


(2) As for the case [math]\lambda=0[/math], here we must have [math]y=0[/math], coming from the first row, and so the eigenvectors here are the vectors of the form [math]\binom{x}{0}[/math].

1d. Scalar products

In order to discuss some interesting examples of matrices, and their diagonalization, in arbitrary dimensions, we will need the following standard fact:

Proposition

Consider the scalar product on [math]\mathbb R^N[/math], given by:

[[math]] \lt x,y \gt =\sum_ix_iy_i [[/math]]

We have then the following formula, valid for any vectors [math]x,y[/math] and any matrix [math]A[/math],

[[math]] \lt Ax,y \gt = \lt x,A^ty \gt [[/math]]
with [math]A^t[/math] being the transpose matrix.


Show Proof

By linearity, it is enough to prove the above formula on the standard basis vectors [math]e_1,\ldots,e_N[/math] of [math]\mathbb R^N[/math]. Thus, we want to prove that for any [math]i,j[/math] we have:

[[math]] \lt Ae_j,e_i \gt = \lt e_j,A^te_i \gt [[/math]]

The scalar product being symmetric, this is the same as proving that:

[[math]] \lt Ae_j,e_i \gt = \lt A^te_i,e_j \gt [[/math]]

On the other hand, for any matrix [math]M[/math] we have the following formula:

[[math]] M_{ij}= \lt Me_j,e_i \gt [[/math]]

Thus, the formula to be proved simply reads:

[[math]] A_{ij}=(A^t)_{ji} [[/math]]

But this precisely the definition of [math]A^t[/math], and we are done.

With this, we can develop some theory. We first have:

Theorem

The orthogonal projections are the matrices satisfying:

[[math]] P^2=P=P^t [[/math]]
These projections are diagonalizable, with eigenvalues [math]0,1[/math].


Show Proof

It is obvious that a linear map [math]f(x)=Px[/math] is a projection precisely when:

[[math]] P^2=P [[/math]]

In order now for this projection to be an orthogonal projection, the condition to be satisfied can be written and then processed as follows:

[[math]] \begin{eqnarray*} \lt Px-Py,Px-x \gt =0 &\iff& \lt x-y,P^tPx-P^tx \gt =0\\ &\iff&P^tPx-P^tx=0\\ &\iff&P^tP-P^t=0 \end{eqnarray*} [[/math]]


Thus we must have [math]P^t=P^tP[/math]. Now observe that by transposing, we have as well:

[[math]] P =(P^tP)^t =P^t(P^t)^t =P^tP [[/math]]

Thus we must have [math]P=P^t[/math], as claimed. Finally, regarding the diagonalization assertion, this is clear by taking a basis of [math]Im(f)[/math], which consists of [math]1[/math]-eigenvectors, and then completing with 0-eigenvectors, which can be found inside the orthogonal of [math]Im(f)[/math].

Here is now a key computation of such projections:

Theorem

The rank [math]1[/math] projections are given by the formula

[[math]] P_x=\frac{1}{||x||^2}(x_ix_j)_{ij} [[/math]]
where the constant, namely

[[math]] ||x||=\sqrt{\sum_ix_i^2} [[/math]]
is the length of the vector.


Show Proof

Consider a vector [math]y\in\mathbb R^N[/math]. Its projection on [math]\mathbb Rx[/math] must be a certain multiple of [math]x[/math], and we are led in this way to the following formula:

[[math]] P_xy =\frac{ \lt y,x \gt }{ \lt x,x \gt }\,x =\frac{1}{||x||^2} \lt y,x \gt x [[/math]]

With this in hand, we can now compute the entries of [math]P_x[/math], as follows:

[[math]] \begin{eqnarray*} (P_x)_{ij} &=& \lt P_xe_j,e_i \gt \\ &=&\frac{1}{||x||^2} \lt e_j,x \gt \lt x,e_i \gt \\ &=&\frac{x_jx_i}{||x||^2} \end{eqnarray*} [[/math]]


Thus, we are led to the formula in the statement.

As an application, we can recover a result that we already know, namely:

Proposition

In [math]2[/math] dimensions, the rank [math]1[/math] projections, which are the projections on the [math]Ox[/math] axis rotated by an angle [math]t/2\in[0,\pi)[/math], are given by the following formula:

[[math]] P_t=\frac{1}{2}\begin{pmatrix}1+\cos t&\sin t\\ \sin t&1-\cos t\end{pmatrix} [[/math]]
Together with the following two matrices, which are the rank [math]0[/math] and [math]2[/math] projections in [math]\mathbb R^2[/math],

[[math]] 0=\begin{pmatrix}0&0\\ 0&0\end{pmatrix}\quad,\quad 1=\begin{pmatrix}1&1\\ 1&1\end{pmatrix} [[/math]]
these are all the projections in [math]2[/math] dimensions.


Show Proof

The first assertion follows from the general formula in Theorem 1.36, by plugging in the following vector, depending on a parameter [math]s\in[0,\pi)[/math]:

[[math]] x=\binom{\cos s}{\sin s} [[/math]]

We obtain in this way the following matrix, which with [math]t=2s[/math] is the one in the statement, via some trigonometry:

[[math]] P_{2s}=\begin{pmatrix}\cos^2s&\cos s\sin s\\ \cos s\sin s&\sin^2 s\end{pmatrix} [[/math]]

As for the second assertion, this is clear from the first one, because outside rank 1 we can only have rank 0 or rank 2, corresponding to the matrices in the statement.

Here is another interesting application, this time in [math]N[/math] dimensions:

Proposition

The projection on the all-[math]1[/math] vector [math]\xi\in\mathbb R^N[/math] is

[[math]] P_\xi=\frac{1}{N}\begin{pmatrix} 1&\ldots&1\\ \vdots&&\vdots\\ 1&\ldots&1\end{pmatrix} [[/math]]
with the all-[math]1[/math] matrix on the right being called the flat matrix.


Show Proof

As already pointed out in the proof of Proposition 1.25, the matrix in the statement acts in the following way:

[[math]] P_\xi \begin{pmatrix}x_1\\ \vdots\\ x_N\end{pmatrix} =\frac{x_1+\ldots+x_N}{N}\begin{pmatrix}1\\ \vdots\\ 1\end{pmatrix} [[/math]]

Thus [math]P_\xi[/math] is indeed a projection onto [math]\mathbb R\xi[/math], and the fact that this projection is indeed the orthogonal one follows either by a direct orthogonality computation, or by using the general formula in Theorem 1.36, by plugging in the all-1 vector [math]\xi[/math].

Let us discuss now, as a final topic of this chapter, the isometries of [math]\mathbb R^N[/math]. We have here the following general result:

Theorem

The linear maps [math]f:\mathbb R^N\to\mathbb R^N[/math] which are isometries, in the sense that they preserve the distances, are those coming from the matrices satisfying:

[[math]] U^t=U^{-1} [[/math]]

These latter matrices are called orthogonal, and they form a set [math]O_N\subset M_N(\mathbb R)[/math] which is stable under taking compositions, and inverses.


Show Proof

We have several things to be proved, the idea being as follows:


(1) We recall that we can pass from scalar products to distances, as follows:

[[math]] ||x||=\sqrt{ \lt x,x \gt } [[/math]]

Conversely, we can compute the scalar products in terms of distances, by using the parallelogram identity, which is as follows:

[[math]] \begin{eqnarray*} ||x+y||^2-||x-y||^2 &=&||x||^2+||y||^2+2 \lt x,y \gt -||x||^2-||y||^2+2 \lt x,y \gt \\ &=&4 \lt x,y \gt \end{eqnarray*} [[/math]]


Now given a matrix [math]U\in M_N(\mathbb R)[/math], we have the following equivalences, with the first one coming from the above identities, and with the other ones being clear:

[[math]] \begin{eqnarray*} ||Ux||=||x|| &\iff& \lt Ux,Uy \gt = \lt x,y \gt \\ &\iff& \lt x,U^tUy \gt = \lt x,y \gt \\ &\iff&U^tUy=y\\ &\iff&U^tU=1\\ &\iff&U^t=U^{-1} \end{eqnarray*} [[/math]]


(2) The second assertion is clear from the definition of the isometries, and can be established as well by using matrices, and the [math]U^t=U^{-1}[/math] criterion.

As a basic illustration here, we have:

Theorem

The rotations and symmetries in the plane, given by

[[math]] R_t=\begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix}\quad,\quad S_t=\begin{pmatrix}\cos t&\sin t\\ \sin t&-\cos t\end{pmatrix} [[/math]]
are isometries. These are all the isometries in [math]2[/math] dimensions.


Show Proof

We already know that [math]R_t[/math] is the rotation of angle [math]t[/math]. As for [math]S_t[/math], this is the symmetry with respect to the [math]Ox[/math] axis rotated by [math]t/2\in\mathbb R[/math]. But this gives the result, since the isometries in 2 dimensions are either rotations, or symmetries.

As a conclusion, the set [math]O_N[/math] from Theorem 1.39 is a quite fundamental object, with [math]O_2[/math] already consisting of some interesting [math]2\times2[/math] matrices, namely the matrices [math]R_t,S_t[/math]. We will be back to [math]O_N[/math], which is a so-called group, and is actually one of the most important examples of groups, on several occasions, in what follows.


General references

Banica, Teo (2024). "Linear algebra and group theory". arXiv:2206.09283 [math.CO].