Revision as of 21:38, 22 April 2025 by Bot
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

2a. Hilbert spaces

[math] \newcommand{\mathds}{\mathbb}[/math]

This article was automatically generated from a tex file and may contain conversion errors. If permitted, you may login and edit this article to improve the conversion.

We discuss in what follows an extension of the linear algebra results from the previous chapter, obtained by looking at the linear operators [math]T:H\to H[/math], with the space [math]H[/math] being no longer assumed to be finite dimensional. Our motivations come from quantum mechanics, and in order to get motivated, here is some suggested reading:


(1) Generally speaking, physics is best learned from Feynman [1]. If you already know some, and want to learn quantum mechanics, go with Griffiths [2]. And if you're already a bit familiar with quantum mechanics, a good book is Weinberg [3].


(2) A look at classics like Dirac [4], von Neumann [5] or Weyl [6] can be instructive too. On the opposite, you have as well modern, fancy books on quantum information, such as Bengtsson-\.Zyczkowski [7], Nielsen-Chuang [8] or Watrous [9].


(3) In short, many ways of getting familiar with this big mess which is quantum mechanics, and as long as you stay away from books advertised as “rigorous”, “axiomatic”, “mathematical”, things fine. By the way, you can try as well my book [10].


Getting to work now, physics tells us to look at infinite dimensional complex spaces, such as the space of wave functions [math]\psi:\mathbb R^3\to\mathbb C[/math] of the electron. In order to do some mathematics on these spaces, we will need scalar products. So, let us start with:

Definition

A scalar product on a complex vector space [math]H[/math] is a binary operation [math]H\times H\to\mathbb C[/math], denoted [math](x,y)\to \lt x,y \gt [/math], satisfying the following conditions:

  • [math] \lt x,y \gt [/math] is linear in [math]x[/math], and antilinear in [math]y[/math].
  • [math]\overline{ \lt x,y \gt }= \lt y,x \gt [/math], for any [math]x,y[/math].
  • [math] \lt x,x \gt \gt 0[/math], for any [math]x\neq0[/math].

As before in chapter 1, we use here mathematicians' convention for scalar products, that is, [math] \lt \,, \gt [/math] linear at left, as opposed to physicists' convention, [math] \lt \,, \gt [/math] linear at right. The reasons for this are quite subtle, coming from the fact that, while basic quantum mechanics looks better with [math] \lt \,, \gt [/math] linear at right, advanced quantum mechanics looks better with [math] \lt \,, \gt [/math] linear at left. Or at least that's what my cats say.


As a basic example for Definition 2.1, we have the finite dimensional vector space [math]H=\mathbb C^N[/math], with its usual scalar product, namely:

[[math]] \lt x,y \gt =\sum_ix_i\bar{y}_i [[/math]]

There are many other examples, and notably various spaces of [math]L^2[/math] functions, which naturally appear in problems coming from physics. We will discuss them later on. In order to study now the scalar products, let us formulate the following definition:

Definition

The norm of a vector [math]x\in H[/math] is the following quantity:

[[math]] ||x||=\sqrt{ \lt x,x \gt } [[/math]]
We also call this number length of [math]x[/math], or distance from [math]x[/math] to the origin.

The terminology comes from what happens in [math]\mathbb C^N[/math], where the length of the vector, as defined above, coincides with the usual length, given by:

[[math]] ||x||=\sqrt{\sum_i|x_i|^2} [[/math]]


In analogy with what happens in finite dimensions, we have two important results regarding the norms. First we have the Cauchy-Schwarz inequality, as follows:

Theorem

We have the Cauchy-Schwarz inequality

[[math]] | \lt x,y \gt |\leq||x||\cdot||y|| [[/math]]
and the equality case holds precisely when [math]x,y[/math] are proportional.


Show Proof

This is something very standard. Consider indeed the following quantity, depending on a real variable [math]t\in\mathbb R[/math], and on a variable on the unit circle, [math]w\in\mathbb T[/math]:

[[math]] f(t)=||twx+y||^2 [[/math]]


By developing [math]f[/math], we see that this is a degree 2 polynomial in [math]t[/math]:

[[math]] \begin{eqnarray*} f(t) &=& \lt twx+y,twx+y \gt \\ &=&t^2 \lt x,x \gt +tw \lt x,y \gt +t\bar{w} \lt y,x \gt + \lt y,y \gt \\ &=&t^2||x||^2+2tRe(w \lt x,y \gt )+||y||^2 \end{eqnarray*} [[/math]]


Since [math]f[/math] is obviously positive, its discriminant must be negative:

[[math]] 4Re(w \lt x,y \gt )^2-4||x||^2\cdot||y||^2\leq0 [[/math]]


But this is equivalent to the following condition:

[[math]] |Re(w \lt x,y \gt )|\leq||x||\cdot||y|| [[/math]]


Now the point is that we can arrange for the number [math]w\in\mathbb T[/math] to be such that the quantity [math]w \lt x,y \gt [/math] is real. Thus, we obtain the following inequality:

[[math]] | \lt x,y \gt |\leq||x||\cdot||y|| [[/math]]


Finally, the study of the equality case is straightforward, by using the fact that the discriminant of [math]f[/math] vanishes precisely when we have a root. But this leads to the conclusion in the statement, namely that the vectors [math]x,y[/math] must be proportional.

As a second main result now, we have the Minkowski inequality:

Theorem

We have the Minkowski inequality

[[math]] ||x+y||\leq||x||+||y|| [[/math]]
and the equality case holds precisely when [math]x,y[/math] are proportional.


Show Proof

This follows indeed from the Cauchy-Schwarz inequality, as follows:

[[math]] \begin{eqnarray*} &&||x+y||\leq||x||+||y||\\ &\iff&||x+y||^2\leq(||x||+||y||)^2\\ &\iff&||x||^2+||y||^2+2Re \lt x,y \gt \leq||x||^2+||y||^2+2||x||\cdot||y||\\ &\iff&Re \lt x,y \gt \leq||x||\cdot||y|| \end{eqnarray*} [[/math]]


As for the equality case, this is clear from Cauchy-Schwarz as well.

As a consequence of this, we have the following result:

Theorem

The following function is a distance on [math]H[/math],

[[math]] d(x,y)=||x-y|| [[/math]]
in the usual sense, that of the abstract metric spaces.


Show Proof

This follows indeed from the Minkowski inequality, which corresponds to the triangle inequality, the other two axioms for a distance being trivially satisfied.

The above result is quite important, because it shows that we can do geometry and analysis in our present setting, with distances and angles, a bit as in the finite dimensional case. In order to do such abstract geometry, we will often need the following key result, which shows that everything can be recovered in terms of distances:

Proposition

The scalar products can be recovered from distances, via the formula

[[math]] 4 \lt x,y \gt =||x+y||^2-||x-y||^2 +i||x+iy||^2-i||x-iy||^2 [[/math]]
called complex polarization identity.


Show Proof

This is something that we have already met in finite dimensions. In arbitrary dimensions the proof is similar, as follows:

[[math]] \begin{eqnarray*} &&||x+y||^2-||x-y||^2+i||x+iy||^2-i||x-iy||^2\\ &=&||x||^2+||y||^2-||x||^2-||y||^2+i||x||^2+i||y||^2-i||x||^2-i||y||^2\\ &&+2Re( \lt x,y \gt )+2Re( \lt x,y \gt )+2iIm( \lt x,y \gt )+2iIm( \lt x,y \gt )\\ &=&4 \lt x,y \gt \end{eqnarray*} [[/math]]


Thus, we are led to the conclusion in the statement.

In order to do analysis on our spaces, we need the Cauchy sequences that we construct to converge. This is something which is automatic in finite dimensions, but in arbitrary dimensions, this can fail. It is convenient here to formulate a detailed new definition, as follows, which will be the starting point for our various considerations to follow:

Definition

A Hilbert space is a complex vector space [math]H[/math] given with a scalar product [math] \lt x,y \gt [/math], satisfying the following conditions:

  • [math] \lt x,y \gt [/math] is linear in [math]x[/math], and antilinear in [math]y[/math].
  • [math]\overline{ \lt x,y \gt }= \lt y,x \gt [/math], for any [math]x,y[/math].
  • [math] \lt x,x \gt \gt 0[/math], for any [math]x\neq0[/math].
  • [math]H[/math] is complete with respect to the norm [math]||x||=\sqrt{ \lt x,x \gt }[/math].

In other words, we have taken here Definition 2.1 above, and added the condition that [math]H[/math] must be complete with respect to the norm [math]||x||=\sqrt{ \lt x,x \gt }[/math], that we know indeed to be a norm, according to the Minkowski inequality proved above. As a basic example, as before, we have the space [math]H=\mathbb C^N[/math], with its usual scalar product, namely:

[[math]] \lt x,y \gt =\sum_ix_i\bar{y}_i [[/math]]


More generally now, we have the following construction of Hilbert spaces:

Proposition

The sequences of complex numbers [math](x_i)[/math] which are square-summable,

[[math]] \sum_i|x_i|^2 \lt \infty [[/math]]
form a Hilbert space [math]l^2(\mathbb N)[/math], with the following scalar product:

[[math]] \lt x,y \gt =\sum_ix_i\bar{y}_i [[/math]]
In fact, given any index set [math]I[/math], we can construct a Hilbert space [math]l^2(I)[/math], in this way.


Show Proof

There are several things to be proved, as follows:


(1) Our first claim is that [math]l^2(\mathbb N)[/math] is a vector space. For this purpose, we must prove that [math]x,y\in l^2(\mathbb N)[/math] implies [math]x+y\in l^2(\mathbb N)[/math]. But this leads us into proving [math]||x+y||\leq||x||+||y||[/math], where [math]||x||=\sqrt{ \lt x,x \gt }[/math]. Now since we know this inequality to hold on each subspace [math]\mathbb C^N\subset l^2(\mathbb N)[/math] obtained by truncating, this inequality holds everywhere, as desired.


(2) Our second claim is that [math] \lt \,, \gt [/math] is well-defined on [math]l^2(\mathbb N)[/math]. But this follows from the Cauchy-Schwarz inequality, [math]| \lt x,y \gt |\leq||x||\cdot||y||[/math], which can be established by truncating, a bit like we established the Minkowski inequality in (1) above.


(3) It is also clear that [math] \lt \,, \gt [/math] is a scalar product on [math]l^2(\mathbb N)[/math], so it remains to prove that [math]l^2(\mathbb N)[/math] is complete with respect to [math]||x||=\sqrt{ \lt x,x \gt }[/math]. But this is clear, because if we pick a Cauchy sequence [math]\{x^n\}_{n\in\mathbb N}\subset l^2(\mathbb N)[/math], then each numeric sequence [math]\{x^n_i\}_{i\in\mathbb N}\subset\mathbb C[/math] is Cauchy, and by setting [math]x_i=\lim_{n\to\infty}x^n_i[/math], we have [math]x^n\to x[/math] inside [math]l^2(\mathbb N)[/math], as desired.


(4) Finally, the same arguments extend to the case of an arbitrary index set [math]I[/math], leading to a Hilbert space [math]l^2(I)[/math], and with the remark here that there is absolutely no problem of taking about quantities of type [math]||x||^2=\sum_{i\in I}|x_i|^2\in[0,\infty][/math], even if the index set [math]I[/math] is uncountable, because we are summing positive numbers.

Even more generally, we have the following construction of Hilbert spaces:

Theorem

Given a measured space [math]X[/math], the functions [math]f:X\to\mathbb C[/math], taken up to equality almost everywhere, which are square-summable,

[[math]] \int_X|f(x)|^2dx \lt \infty [[/math]]
form a Hilbert space [math]L^2(X)[/math], with the following scalar product:

[[math]] \lt f,g \gt =\int_Xf(x)\overline{g(x)}dx [[/math]]
In the case [math]X=I[/math], with the counting measure, we obtain in this way the space [math]l^2(I)[/math].


Show Proof

This is a straightforward generalization of Proposition 2.8, with the arguments from the proof of Proposition 2.8 carrying over in our case, as follows:


(1) The first part, regarding Cauchy-Schwarz and Minkowski, extends without problems, by using this time approximation by step functions.


(2) Regarding the fact that [math] \lt \,, \gt [/math] is indeed a scalar product on [math]L^2(X)[/math], there is a subtlety here, because if we want [math] \lt f,f \gt \gt 0[/math] for [math]f\neq 0[/math], we must declare that [math]f=0[/math] when [math]f=0[/math] almost everywhere, and so that [math]f=g[/math] when [math]f=g[/math] almost everywhere.


(3) Regarding the fact that [math]L^2(X)[/math] is complete with respect to [math]||f||=\sqrt{ \lt f,f \gt }[/math], this is again basic measure theory, by picking a Cauchy sequence [math]\{f_n\}_{n\in\mathbb N}\subset L^2(X)[/math], and then constructing a pointwise, and hence [math]L^2[/math] limit, [math]f_n\to f[/math], almost everywhere.


(4) Finally, the last assertion is clear, because the integration with respect to the counting measure is by definition a sum, and so [math]L^2(I)=l^2(I)[/math] in this case.

Quite remarkably, any Hilbert space must be of the form [math]L^2(X)[/math], and even of the particular form [math]l^2(I)[/math]. This follows indeed from the following key result:

Theorem

Let [math]H[/math] be a Hilbert space.

  • Any algebraic basis of this space [math]\{f_i\}_{i\in I}[/math] can be turned into an orthonormal basis [math]\{e_i\}_{i\in I}[/math], by using the Gram-Schmidt procedure.
  • Thus, [math]H[/math] has an orthonormal basis, and so we have [math]H\simeq l^2(I)[/math], with [math]I[/math] being the indexing set for this orthonormal basis.


Show Proof

All this is standard by Gram-Schmidt, the idea being as follows:


(1) First of all, in finite dimensions an orthonormal basis [math]\{e_i\}_{i\in I}[/math] is by definition a usual algebraic basis, satisfying [math] \lt e_i,e_j \gt =\delta_{ij}[/math]. But the existence of such a basis follows by applying the Gram-Schmidt procedure to any algebraic basis [math]\{f_i\}_{i\in I}[/math], as claimed.


(2) In infinite dimensions, a first issue comes from the fact that the standard basis [math]\{\delta_i\}_{i\in\mathbb N}[/math] of the space [math]l^2(\mathbb N)[/math] is not an algebraic basis in the usual sense, with the finite linear combinations of the functions [math]\delta_i[/math] producing only a dense subspace of [math]l^2(\mathbb N)[/math], that of the functions having finite support. Thus, we must fine-tune our definition of “basis”.


(3) But this can be done in two ways, by saying that [math]\{f_i\}_{i\in I}[/math] is a basis of [math]H[/math] when the functions [math]f_i[/math] are linearly independent, and when either the finite linear combinations of these functions [math]f_i[/math] form a dense subspace of [math]H[/math], or the linear combinations with [math]l^2(I)[/math] coefficients of these functions [math]f_i[/math] form the whole [math]H[/math]. For orthogonal bases [math]\{e_i\}_{i\in I}[/math] these definitions are equivalent, and in any case, our statement makes now sense.


(4) Regarding now the proof, in infinite dimensions, this follows again from Gram-Schmidt, exactly as in the finite dimensional case, but by using this time a tool from logic, called Zorn lemma, in order to correctly do the recurrence.

The above result, and its relation with Theorem 2.9, is something quite subtle, so let us further get into this. First, we have the following definition, based on the above:

Definition

A Hilbert space [math]H[/math] is called separable when the following equivalent conditions are satisfied:

  • [math]H[/math] has a countable algebraic basis [math]\{f_i\}_{i\in\mathbb N}[/math].
  • [math]H[/math] has a countable orthonormal basis [math]\{e_i\}_{i\in\mathbb N}[/math].
  • We have [math]H\simeq l^2(\mathbb N)[/math], isomorphism of Hilbert spaces.

In what follows we will be mainly interested in the separable Hilbert spaces, where most of the questions coming from quantum physics take place. In view of the above, the following philosophical question appears: why not simply talking about [math]l^2(\mathbb N)[/math]?


In answer to this, we cannot really do so, because many of the separable spaces that we are interested in appear as spaces of functions, and such spaces do not necessarily have a very simple or explicit orthonormal basis, as shown by the following result:

Proposition

The Hilbert space [math]H=L^2[0,1][/math] is separable, having as orthonormal basis the orthonormalized version of the algebraic basis [math]f_n=x^n[/math] with [math]n\in\mathbb N[/math].


Show Proof

This follows from the Weierstrass theorem, which provides us with the basis [math]f_n=x^n[/math], which can be orthogonalized by using the Gram-Schmidt procedure, as explained in Theorem 2.10. Working out the details here is actually an excellent exercise.

As a conclusion to all this, we are interested in 1 space, namely the unique separable Hilbert space [math]H[/math], but due to various technical reasons, it is often better to forget that we have [math]H=l^2(\mathbb N)[/math], and say instead that we have [math]H=L^2(X)[/math], with [math]X[/math] being a separable measured space, or simply say that [math]H[/math] is an abstract separable Hilbert space.

General references

Banica, Teo (2024). "Principles of operator algebras". arXiv:2208.03600 [math.OA].

References

  1. R.P. Feynman, R.B. Leighton and M. Sands, The Feynman lectures on physics, Caltech (1963).
  2. D.J. Griffiths and D.F. Schroeter, Introduction to quantum mechanics, Cambridge Univ. Press (2018).
  3. D. Weingarten, Asymptotic behavior of group integrals in the limit of infinite rank, J. Math. Phys. 19 (1978), 999--1001.
  4. P.A.M. Dirac, Principles of quantum mechanics, Oxford Univ. Press (1930).
  5. J. von Neumann, Mathematical foundations of quantum mechanics, Princeton Univ. Press (1955).
  6. H. Weyl, The theory of groups and quantum mechanics, Princeton Univ. Press (1931).
  7. I. Bengtsson and K. \.Zyczkowski, Geometry of quantum states, Cambridge Univ. Press (2006).
  8. M.A. Nielsen and I.L. Chuang, Quantum computation and quantum information, Cambridge Univ. Press (2000).
  9. J. Watrous, The theory of quantum information, Cambridge Univ. Press (2018).
  10. T. Banica, Introduction to modern physics (2024).