16a. Operators, matrices
Welcome to quantum mechanics, and in the hope that we will survive. We already talked in chapter 8 about the main idea of Heisenberg, namely using infinite matrices in order to axiomatize quantum mechanics, based on the following key fact: \begin{fact}[Rydberg, Ritz] The spectral lines of the hydrogen atom are given by the Rydberg formula, as follows, depending on integer parameters [math]n_1 \lt n_2[/math]:
These spectral lines combine according to the Ritz-Rydberg principle, as follows:
Similar formulae hold for other atoms, with suitable fine-tunings of the constant [math]R[/math]. \end{fact} We refer to chapter 8 for the full story with all this, which is theory based on some key observations of Lyman, Balmer, Paschen, around 1890-1900. The point now is that the above combination principle reminds the multiplication formula [math]e_{n_1n_2}e_{n_2n_3}=e_{n_1n_3}[/math] for the elementary matrices [math]e_{ij}:e_j\to e_i[/math], which leads to the following principle: \begin{principle}[Heisenberg] Observables in quantum mechanics should be some sort of infinite matrices, generalizing the Lyman, Balmer, Paschen lines of the hydrogen atom, and multiplying between them as the matrices do, as to produce further observables. \end{principle} All this is quite deep, and needs a number of comments, as follows:
(1) First of all, our matrices must be indeed infinite, because so are the series observed by Lyman, Balmer, Paschen, corresponding to [math]n_1=1,2,3[/math] in the Rydberg formula, and making it clear that the range of the second parameter [math]n_2 \gt n_1[/math] is up to [math]\infty[/math].
(2) Although this was not known to Ritz-Rydberg and Heisenberg, let us mention too that some later results of Brackett, Pfund, Humphreys and others, at [math]n_1=4,5,6,\ldots\,[/math], confirmed the fact that the range of the first parameter [math]n_1[/math] is up to [math]\infty[/math] too.
(3) As a more tricky comment now, going beyond what Principle 16.2 says, our infinite matrices must be in fact complex. This was something known to Heisenberg, and later Schrödinger came with proof that quantum mechanics naturally lives over [math]\mathbb C[/math].
(4) But all this leads us into some tricky mathematics, because the infinite matrices [math]A\in M_\infty(\mathbb C)[/math] do not act on the vectors [math]v\in\mathbb C^\infty[/math] just like that. For instance the all-one matrix [math]A_{ij}=1[/math] does not act on the all-one vector [math]v_i=1[/math], for obvious reasons.
Summarizing, in order to get to some mathematical theory going, out of Principle 16.2, we must assume that our matrices [math]A\in M_\infty(\mathbb C)[/math] must be “bounded” in some sense. Or perhaps the vectors [math]v\in\mathbb C^\infty[/math] must be bounded. Or perhaps, both.
In order to fix all this, let us start with [math]\mathbb C^\infty[/math]. We would like to replace it with its subspace [math]H=l^2(\mathbb N)[/math] consisting of vectors having finite norm, as for our various computations to converge. But this being said, taking a look at what Schrödinger was saying too, a bit later, why not including right away in our theory spaces like [math]H=L^2(\mathbb R^3)[/math] too, which are perhaps a bit more relevant than Heisenberg's [math]l^2(\mathbb N)[/math]. We are led in this way into:
A Hilbert space is a complex vector space [math]H[/math] with a scalar product [math] \lt x,y \gt [/math], which will be linear at left and antilinear at right,
Here our convention for the scalar products, written [math] \lt x,y \gt [/math] and being linear at left, is one among others, often used by mathematicians, and we will just use this, in the lack of a physicist with an axe around. As further comments now on Definition 16.3, there is some mathematics encapsulated there, needing some discussion. First, we have:
Given an index set [math]I[/math], which can be finite or not, the space of square-summable vectors having indices in [math]I[/math], namely
We have already met such things in chapter 7, but let us recall all this:
(1) We know that [math]l^2(I)\subset\mathbb C^I[/math] is the space of vectors satisfying [math]||x|| \lt \infty[/math]. We want to prove that [math]l^2(I)[/math] is a vector space, that [math] \lt x,y \gt [/math] is a scalar product on it, that [math]l^2(I)[/math] is complete with respect to [math]||.||[/math], and finally that for [math]|I| \lt \infty[/math] we have [math]l^2(I)=\mathbb C^{|I|}[/math].
(2) The last assertion, [math]l^2(I)=\mathbb C^{|I|}[/math] for [math]|I| \lt \infty[/math], is clear, because in this case the sums are finite, so the condition [math]||x|| \lt \infty[/math] is automatic. So, we know at least one thing.
(3) Regarding the rest, our claim here, which will more or less prove everything, is that for any two vectors [math]x,y\in l^2(I)[/math] we have the Cauchy-Schwarz inequality:
But this follows from the positivity of the following degree 2 quantity, depending on a real variable [math]t\in\mathbb R[/math], and on a variable on the unit circle, [math]w\in\mathbb T[/math]:
(4) Now with Cauchy-Schwarz proved, everything is straightforward. We first obtain, by raising to the square and expanding, that for any [math]x,y\in l^2(I)[/math] we have:
Thus [math]l^2(I)[/math] is indeed a vector space, the other vector space conditions being trivial.
(5) Also, [math] \lt x,y \gt [/math] is surely a scalar product on this vector space, because all the conditions for a scalar product are trivially satisfied.
(6) Finally, the fact that our space [math]l^2(I)[/math] is indeed complete with respect to its norm [math]||.||[/math] follows in the obvious way, the limit of a Cauchy sequence [math]\{x_n\}[/math] being the vector [math]y=(y_i)[/math] given by [math]y_i=\lim_{n\to\infty}x_{ni}[/math], with all the verifications here being trivial.
Going now a bit abstract, we have, more generally, the following result, which shows that our formalism covers as well the Schrödinger spaces of type [math]L^2(\mathbb R^3)[/math]:
Given an arbitrary space [math]X[/math] with a positive measure [math]\mu[/math] on it, the space of square-summable complex functions on it, namely
This is something routine, remake of Theorem 16.4, as follows:
(1) The proof of the first, and main assertion is something perfectly similar to the proof of Theorem 16.4, by replacing everywhere the sums by integrals.
(2) With the remark that we forgot to say in the statement that the [math]L^2[/math] functions are by definition taken up to equality almost everywhere, [math]f=g[/math] when [math]||f-g||=0[/math].
(3) As for the last assertion, when [math]\mu[/math] is the counting measure all our integrals here become usual sums, and so we recover in this way Theorem 16.4.
As a third and last theorem about Hilbert spaces, that we will need, we have:
Any Hilbert space [math]H[/math] has an orthonormal basis [math]\{e_i\}_{i\in I}[/math], which is by definition a set of vectors whose span is dense in [math]H[/math], and which satisfy
We have many assertions here, the idea being as follows:
(1) In finite dimensions an orthonormal basis [math]\{e_i\}_{i\in I}[/math] can be constructed by starting with any vector space basis [math]\{x_i\}_{i\in I}[/math], and using the Gram-Schmidt procedure. As for the other assertions, these are all clear, from basic linear algebra.
(2) In general, the same method works, namely Gram-Schmidt, with a subtlety coming from the fact that the basis [math]\{e_i\}_{i\in I}[/math] will not span in general the whole [math]H[/math], but just a dense subspace of it, as it is in fact obvious by looking at the standard basis of [math]l^2(\mathbb N)[/math].
(3) And there is a second subtlety as well, coming from the fact that the recurrence procedure needed for Gram-Schmidt must be replaced by some sort of “transfinite recurrence”, using scary tools from logic, and more specifically the Zorn lemma.
(4) Finally, everything at the end is clear from definitions, except perhaps for the fact that [math]L^2(\mathbb R)[/math] is separable. But here we can argue that, since functions can be approximated by polynomials, we have a countable algebraic basis, namely [math]\{x^n\}_{n\in\mathbb N}[/math], called the Weierstrass basis, that we can orthogonalize afterwards by using Gram-Schmidt.
Moving ahead, now that we know what our vector spaces are, we can talk about infinite matrices with respect to them. And the situation here is as follows:
Given a Hilbert space [math]H[/math], consider the linear operators [math]T:H\to H[/math], and for each such operator define its norm by the following formula:
This is something straightforward, the idea being as follows:
(1) The fact that we have indeed an algebra, satisfying the product condition in the statement, follows from the following estimates, which are all elementary:
(2) Regarding now the completness assertion, if [math]\{T_n\}\subset B(H)[/math] is Cauchy then [math]\{T_nx\}[/math] is Cauchy for any [math]x\in H[/math], so we can define the limit [math]T=\lim_{n\to\infty}T_n[/math] by setting:
Let us first check that the application [math]x\to Tx[/math] is linear. We have:
Similarly, we have [math]T(\lambda x)=\lambda T(x)[/math], and we conclude that [math]T\in\mathcal L(H)[/math].
(3) With this done, it remains to prove now that we have [math]T\in B(H)[/math], and that [math]T_n\to T[/math] in norm. For this purpose, observe that we have:
But this gives both [math]T\in B(H)[/math], and [math]T_N\to T[/math] in norm, and we are done.
(4) Regarding the embeddings, the correspondence [math]T\to M[/math] in the statement is indeed linear, and its kernel is [math]\{0\}[/math], so we have indeed an embedding as follows, as claimed:
In finite dimensions we have an isomorphism, because any [math]M\in M_N(\mathbb C)[/math] determines an operator [math]T:\mathbb C^N\to\mathbb C^N[/math], given by [math] \lt Te_j,e_i \gt =M_{ij}[/math]. However, in infinite dimensions, we have matrices not producing operators, as for instance the all-one matrix.
(5) As for the examples of linear operators which are not bounded, these are more complicated, coming from logic, and we will not need them in what follows.
Finally, as a second and last result regarding the operators, we will need:
Each operator [math]T\in B(H)[/math] has an adjoint [math]T^*\in B(H)[/math], given by:
at the level of the associated matrices [math]M\in M_I(\mathbb C)[/math].
This is standard too, and can be proved in 3 steps, as follows:
(1) The existence of the adjoint operator [math]T^*[/math], given by the formula in the statement, comes from the fact that the function [math]\varphi(x)= \lt Tx,y \gt [/math] being a linear map [math]H\to\mathbb C[/math], we must have a formula as follows, for a certain vector [math]T^*y\in H[/math]:
Moreover, since this vector is unique, [math]T^*[/math] is unique too, and we have as well:
Observe also that we have indeed [math]T^*\in B(H)[/math], because:
(2) Regarding now [math]||TT^*||=||T||^2[/math], which is a key formula, observe that we have:
On the other hand, we have as well the following estimate:
By replacing [math]T\to T^*[/math] we obtain from this [math]||T||^2\leq||TT^*||[/math], as desired.
(3) Finally, when [math]H[/math] comes with a basis, the formula [math] \lt Tx,y \gt = \lt x,T^*y \gt [/math] applied with [math]x=e_i[/math], [math]y=e_j[/math] translates into the formula [math](M^*)_{ij}=\overline{M}_{ji}[/math], as desired.
So long for Hilbert spaces and operators. For more, you can check my book [1].
General references
Banica, Teo (2024). "Calculus and applications". arXiv:2401.00911 [math.CO].