12d. Lagrange multipliers

This article was automatically generated from a tex file and may contain conversion errors. If permitted, you may login and edit this article to improve the conversion.

We discuss now some more specialized topics, in relation with optimization. First of all, thinking well, the functions that we have to minimize or maximize, in the real life, are often defined on a manifold, instead of being defined on the whole [math]\mathbb R^N[/math]. Fortunately, the good old principle [math]f'(x)=0[/math] can be adapted to the manifold case, as follows: \begin{principle} In order for a function [math]f:X\to\mathbb R[/math] defined on a manifold [math]X[/math] to have a local extremum at [math]x\in X[/math], we must have, as usual

[[math]] f'(x)=0 [[/math]]

but with this taking into account the fact that the equations defining the manifold count as well as “zero”, and so must be incorporated into the formula [math]f'(x)=0[/math]. \end{principle} Obviously, we are punching here about our weight, because our discussion about manifolds from chapter 11 was quite amateurish, and we have no tools in our bag for proving such things, or even for properly formulating them. However, we can certainly talk about all this, a bit like physicists do. So, our principle will be the one above, and in practice, the idea is that we must have a formula as follows, with [math]g_i[/math] being the constraint functions for our manifold [math]X[/math], and with [math]\lambda_i\in\mathbb R[/math] being certain scalars, called Lagrange multipliers:

[[math]] f'(x)=\sum_i\lambda_ig_i'(x) [[/math]]

As a basic illustration for this, our claim is that, by using a suitable manifold, and a suitable function, and Lagrange multipliers, we can prove in this way the Hölder inequality, that we know well of course, but without any computation. Let us start with:

Proposition

For any exponent [math]p \gt 1[/math], the following set

[[math]] S_p=\left\{x\in\mathbb R^N\Big|\sum_i|x_i|^p=1\right\} [[/math]]

is a submanifold of [math]\mathbb R^N[/math].

Show Proof

We know from chapter 11 that the unit sphere in [math]\mathbb R^N[/math] is a manifold. In our terms, this solves our problem at [math]p=2[/math], because this unit sphere is:

[[math]] S_2=\left\{x\in\mathbb R^N\Big|\sum_ix_i^2=1\right\} [[/math]]

Now observe that we have a bijection [math]S_p\simeq S_2[/math], at least on the part where all the coordinates are positive, [math]x_i \gt 0[/math], given by the following function:

[[math]] x_i\to x_i^{2/p} [[/math]]

Thus we obtain that [math]S_p[/math] is indeed a manifold, as claimed.

■

We already know that the manifold [math]S_p[/math] constructed above is the unit sphere, in the case [math]p=2[/math]. In order to have a better geometric picture of what is going on, in general, observe that [math]S_p[/math] can be constructed as well at [math]p=1[/math], as follows:

[[math]] S_1=\left\{x\in\mathbb R^N\Big|\sum_i|x_i|=1\right\} [[/math]]

However, this is no longer a manifold, as we can see for instance at [math]N=2[/math], where we obtain a square. Now observe that we can talk as well about [math]p=\infty[/math], as follows:

[[math]] S_\infty=\left\{x\in\mathbb R^N\Big|\sup_i|x_i|=1\right\} [[/math]]

This letter set is no longer a manifold either, as we can see for instance at [math]N=2[/math], where we obtain again a square, containing the previous square, the one at [math]p=1[/math].

With these limiting constructions in hand, we can have now a better geometric picture of what is going on, in the general context of Proposition 12.27. Indeed, let us draw, at [math]N=2[/math] for simplifying, our sets [math]S_p[/math] at the values [math]p=1,2,\infty[/math] of the exponent:

[[math]] \xymatrix@R=40pt@C=40pt{ \circ\ar@{--}[r]\ar@{--}[d]&\circ\ar@{--}[r]\ar@{.}[dr]\ar@{.}[dl]\ar@{-}@/^/[dr]&\circ\ar@{--}[d]\\ \circ\ar@{--}[d]\ar@{.}[dr]\ar@{-}@/^/[ur]&\circ&\circ\ar@{--}[d]\ar@{.}[dl]\ar@{-}@/^/[dl]\\ \circ\ar@{--}[r]&\circ\ar@{--}[r]\ar@{-}@/^/[ul]&\circ } [[/math]]

We can see that what we have is a small square, at [math]p=1[/math], becoming smooth and inflating towards the circle, in the parameter range [math]p\in(1,2][/math], and then further inflating, in the parameter range [math]p\in[2,\infty)[/math], towards the big square appearing at [math]p=\infty[/math].

With these preliminaries in hand, we can formulate our result, as follows:

Theorem

The local extrema over [math]S_p[/math] of the function

[[math]] f(x)=\sum_ix_iy_i [[/math]]

can be computed by using Lagrange multipliers, and this gives

[[math]] \left|\sum_ix_iy_i\right|\leq\left(\sum_i|x_i|^p\right)^{1/p}\left(\sum_i|y_i|^q\right)^{1/q} [[/math]]

with [math]1/p+1/q=1[/math], that is, the Hölder inequality, with a purely geometric proof.

Show Proof

We can restrict the attention to the case where all the coordinates are positive, [math]x_i \gt 0[/math] and [math]y_i \gt 0[/math]. The derivative of the function in the statement is:

[[math]] f'(x)=(y_1,\ldots,y_N) [[/math]]

On the other hand, we know that the manifold [math]S_p[/math] appears by definition as the set of zeroes of the function [math]\varphi(x)=\sum_ix_i^p-1[/math], having derivative as follows:

[[math]] \varphi'(x)=p(x_1^{p-1},\ldots,x_N^{p-1}) [[/math]]

Thus, by using Lagrange multipliers, the critical points of [math]f[/math] must satisfy:

[[math]] (y_1,\ldots,y_N)\sim(x_1^{p-1},\ldots,x_N^{p-1}) [[/math]]

In other words, the critical points must satisfy [math]x_i=\lambda y_i^{1/(p-1)}[/math], for some [math]\lambda \gt 0[/math], and by using now [math]\sum_ix_i^p=1[/math] we can compute the precise value of [math]\lambda[/math], and we get:

[[math]] \lambda=\left(\sum_iy_i^{p/(p-1)}\right)^{-1/p} [[/math]]

Now let us see what this means. Since the critical point is unique, this must be a maximum of our function, and we conclude that for any [math]x\in S_p[/math], we have:

[[math]] \sum_ix_iy_i\leq\sum_i\lambda y_i^{1/(p-1)}\cdot y_i=\left(\sum_iy_i^{p/(p-1)}\right)^{1-1/p}=\left(\sum_iy_i^q\right)^{1/q} [[/math]]

Thus we have Hölder, and the general case follows from this, by rescaling.

■

As a second illustration for the method of Lagrange multipliers, this time in relation with certain questions from linear algebra, let us go back to the Hadamard matrices, that we met in chapter 7. In the real case, the basic theory of these matrices is as follows:

Theorem

The real Hadamard matrices, [math]H\in M_N(-1,1)[/math] having pairwise orthogonal rows, have the following properties:

The set of Hadamard matrices is [math]X_N=M_N(-1,1)\cap\sqrt{N}O_N[/math].
In order to have [math]X_N\neq\emptyset[/math], the matrix size must be [math]N\in\{2\}\cup 4\mathbb N[/math].
For [math]H\in M_N(-1,1)[/math] we have [math]|\det H|\leq N^{N/2}[/math], with equality when [math]H[/math] is Hadamard.
For [math]U\in O_N[/math] we have [math]||U||_1\leq N\sqrt{N}[/math], with equality when [math]H=\sqrt{N}U[/math] is Hadamard.

Show Proof

Many things going on here, the idea being as follows:

(1) This is just a reformulation of the Hadamard matrix condition.

(2) This follows by playing with the first 3 rows, exercise for you.

(3) This follows from our definition of the determinant, as a signed volume.

(4) This follows from [math]||U||_2=\sqrt{N}[/math] and Cauchy-Schwarz, easy exercise for you.

■

All the above is quite interesting, and (1,2) raise the question of finding the correct generalizations of the Hadamard matrices, at [math]N\notin\{2\}\cup 4\mathbb N[/math]. But the answer here comes from (3,4), which suggest looking either at the maximizers of [math]|\det|[/math] on [math]M_N(-1,1)[/math], or of [math]||.||_1[/math] on [math]\sqrt{N}O_N[/math]. By following this latter way, we are led to the following question: \begin{question} What are the critical points of the [math]1[/math]-norm on [math]O_N[/math]? \end{question} And, good news, we can solve this latter question by using the theory of Lagrange multipliers developed in the above, the result here being as follows:

Theorem

An orthogonal matrix with nonzero entries is a critical point of

[[math]] ||U||_1=\sum_{ij}|U_{ij}| [[/math]]

precisely when [math]SU^t[/math] is symmetric, where [math]S_{ij}={\rm sgn}(U_{ij})[/math].

Show Proof

We regard [math]O_N[/math] as a real algebraic manifold, with coordinates [math]U_{ij}[/math]. This manifold consists by definition of the zeroes of the following polynomials:

[[math]] A_{ij}=\sum_kU_{ik}U_{jk}-\delta_{ij} [[/math]]

Thus [math]U\in O_N[/math] is a critical point of [math]F(U)=||U||_1[/math] when the following is satisfied:

[[math]] dF\in span(dA_{ij}) [[/math]]

Regarding the space [math]span(dA_{ij})[/math], this consists of the following quantities:

[[math]] \begin{eqnarray*} \sum_{ij}M_{ij}dA_{ij} &=&\sum_{ijk}M_{ij}(U_{ik}dU_{jk}+U_{jk}dU_{ik})\\ &=&\sum_{ij}(M^tU)_{ij}dU_{ij}+\sum_{ij}(MU)_{ij}dU_{ij} \end{eqnarray*} [[/math]]

In order to compute [math]dF[/math], observe first that, with [math]S_{ij}=sgn(U_{ij})[/math], we have:

[[math]] d|U_{ij}| =d\sqrt{U_{ij}^2} =\frac{U_{ij}dU_{ij}}{|U_{ij}|} =S_{ij}dU_{ij} [[/math]]

Thus [math]dF=\sum_{ij}S_{ij}dU_{ij}[/math], and so [math]U\in O_N[/math] is a critical point of [math]F[/math] precisely when there exists a matrix [math]M\in M_N(\mathbb R)[/math] such that the following two conditions are satisfied:

[[math]] S=M^tU\quad,\quad S=MU [[/math]]

Now observe that these two equations can be written as follows:

[[math]] M^t=SU^t\quad,\quad M=SU^t [[/math]]

Thus, the matrix [math]SU^t[/math] must be symmetric, as claimed.

■

General references

Banica, Teo (2024). "Calculus and applications". arXiv:2401.00911 [math.CO].