12a. Higher derivatives

[math] \newcommand{\mathds}{\mathbb}[/math]

This article was automatically generated from a tex file and may contain conversion errors. If permitted, you may login and edit this article to improve the conversion.

Welcome back to analysis, and in the hope that you survived the previous chapter. Goow news, in this chapter we go for the real thing, namely the computation of maxima and minima of functions. Want to optimize the characteristics of your building, bridge, plane, engine, medicine, computer, or casino and other racketeering operations? It all comes down, you guessed right, to maximizing or minimizing a certain function.


In one variable things are quite easy, at least from a theoretical viewpoint, due to the following result, that we know well from Part I of the present book:

Theorem

The one-variable smooth functions are subject to the Taylor formula

[[math]] f(x+t)=\sum_{k=0}^\infty\frac{f^{(k)}(x)}{k!}\,t^k [[/math]]
which allows, via suitable truncations, to determine the local maxima and minima.


Show Proof

This is a compact summary of what we know from Part I, with everything being in fact quite technical, and with the idea being as follows:


(1) In order to compute the local maxima and minima, a first method is by using the following formula, which comes straight from the definition of the derivative:

[[math]] f(x+t)\simeq f(x)+f'(x)t [[/math]]


Indeed, this formula shows that when [math]f'(x)\neq0[/math], the point [math]x[/math] cannot be a local minimum or maximum, due to the fact that [math]t\to-t[/math] will invert the growth. Thus, in order to find the local minima and maxima, we must compute first the points [math]x[/math] satisfying [math]f'(x)=0[/math], and then perform a more detailed study of each solution [math]x[/math] that we found.


(2) In relation with the problems left, the second derivative comes to the rescue. Indeed, we can use the following more advanced formula, coming via l'H\^opital's rule:

[[math]] f(x+t)\simeq f(x)+f'(x)t+\frac{f''(x)}{2}\,t^2 [[/math]]


To be more precise, assume that we have [math]f'(x)=0[/math], as required by the study in (1). Then this second order formula simply reads:

[[math]] f(x+t)\simeq f(x)+\frac{f''(x)}{2}\,t^2 [[/math]]


But this is something very useful, telling us that when [math]f''(x) \lt 0[/math], what we have is a local maximum, and when [math]f''(x) \gt 0[/math], what we have is a local minimum. As for the remaining case, that when [math]f''(x)=0[/math], things here remain open.


(3) All this is very useful in practice, and with what we have in (1), complemented if needed with what we have in (2), we can in principle compute the local minima and maxima, without much troubles. However, if really needed, more tools are available. Indeed, we can use if we want the order 3 Taylor formula, which is as follows:

[[math]] f(x+t)\simeq f(x)+f'(x)t+\frac{f''(x)}{2}\,t^2+\frac{f'''(x)}{6}\,t^3 [[/math]]


To be more precise, assume that we are in the case [math]f'(x)=f''(x)=0[/math], which is where our joint algorithm coming from (1) and (2) fails. In this case, our formula becomes:

[[math]] f(x+t)\simeq f(x)+\frac{f'''(x)}{6}\,t^3 [[/math]]


But this solves the problem in the case [math]f'''(x)\neq0[/math], because here we cannot have a local minimum or maximum, due to [math]t\to-t[/math] which switches growth. As for the remaining case, [math]f'''(x)=0[/math], things here remain open, and we have to go at higher order.


(4) Summarizing, we have a recurrence method for solving our problem. In order to formulate now an abstract result about this, we can use the Taylor formula at order [math]n[/math]:

[[math]] f(x+t)\simeq\sum_{k=0}^n\frac{f^{(k)}(x)}{k!}\,t^k [[/math]]


Indeed, assume that we started to compute the derivatives [math]f'(x),f''(x),f''(x),\ldots[/math] of our function at the point [math]x[/math], with the goal of finding the first such derivative which does not vanish, and we found this derivative, as being the order [math]n[/math] one:

[[math]] f'(x)=f''(x)=\ldots=f^{(n-1)}(x)=0\quad,\quad f^{(n)}(x)\neq0 [[/math]]


Then, the Taylor formula at [math]x[/math] at order [math]n[/math] takes the following form:

[[math]] f(x+t)\simeq f(x)+\frac{f^{(n)}(x)}{n!}\,t^n [[/math]]


But this is exactly what we need, in order to fully solve our local extremum problem. Indeed, when [math]n[/math] is even, if [math]f^{(n)}(x) \lt 0[/math] what we have is a local maximum, and if [math]f^{(n)}(x) \gt 0[/math], what we have is a local minimum. As for the case where [math]n[/math] is odd, here we cannot have a local minimum or maximum, due to [math]t\to-t[/math] which switches growth.

All the above, Theorem 12.1 and its proof, must be of course perfectly known, when looking for applications of such things. However, for theoretical purposes, let us record as well, in a very compact form, what is basically to be remembered:

Theorem

Given a differentiable function [math]f:\mathbb R\to\mathbb R[/math], we can always write

[[math]] f(x+t)\simeq f(x)+\frac{f^{(n)}(x)}{n!}\,t^n [[/math]]
with [math]f^{(n)}(x)\neq0[/math], and this tells us if [math]x[/math] is a local minimum, or maximum of [math]f[/math].


Show Proof

This was the conclusion of the proof of Theorem 12.1, and with the extra remark that local extremum means that [math]n[/math] is even, with in this case [math]f^{(n)}(x) \lt 0[/math] corresponding to local maximum, and [math]f^{(n)}(x) \gt 0[/math] corresponding to local minimum.

In several variables now, things will be quite tricky, making full use of the material that we learned in the previous 3 chapters, and even more, requiring some continuations of that. Indeed, we need a lot of knowledge, in order to solve our problems:


(1) The linear algebra from chapter 9 is the backbone of multivariable calculus, so we will surely need all that, and more. For instance, since the first derivatives [math]f'(x)[/math] are now matrices, expect some sort of positivity theory for the matrices, to be needed.


(2) Regarding the partial derivatives from chapter 10, no question about it, all that material is certainly useful, and the more theory we have here, the better it will be, for our questions. The problem is that of iterating that partial derivative operations.


(3) Finally, regarding the geometry from chapter 11, the functions that we want to study will be naturally defined either on [math]\mathbb R^N[/math], or on spheres, ellipses, cylinders and other differential manifolds [math]X\subset\mathbb R^N[/math], so we will certainly need that material too.


Getting started now, we can talk about higher derivatives, in the obvious way, simply by performing the operation of taking derivatives recursively. As result here, we have:

Theorem

Given a continuous function [math]f:\mathbb R^N\to\mathbb R[/math], we can talk about its higher derivatives, defined recursively as

[[math]] \frac{d^kf}{dx_{i_1}\ldots dx_{i_k}}=\frac{d}{dx_{i_1}}\cdots\frac{d}{dx_{i_k}}(f) [[/math]]
provided that all these derivatives exist indeed. Moreover, due to the Clairaut formula,

[[math]] \frac{d^2f}{dx_idx_j}=\frac{d^2f}{dx_jdx_i} [[/math]]
the order in which these higher derivatives are computed is irrelevant.


Show Proof

There are several things going on here, the idea being as follows:


(1) First of all, we can talk about the quantities in the statement, with the remark however that at each step of our recursion, the corresponding partial derivative can exist of not. We will say in what follows that our function is [math]k[/math] times differentiable if the quantities in the statement exist at any [math]l\leq k[/math], and smooth, if this works with [math]k=\infty[/math].


(2) Regarding now the second assertion, this is something more tricky. Let us first recall from chapter 8 that the second derivatives of a twice differentiable function of two variables [math]f:\mathbb R^2\to\mathbb R[/math] are subject to the Clairaut formula, namely:

[[math]] \frac{d^2f}{dxdy}=\frac{d^2f}{dydx} [[/math]]


(3) But this result clearly extends to our function [math]f:\mathbb R^N\to\mathbb R[/math], simply by ignoring the unneeded variables, so we have the Clairaut formula in general, also called Schwarz formula, which is the one in the statement, namely:

[[math]] \frac{d^2f}{dx_idx_j}=\frac{d^2f}{dx_jdx_i} [[/math]]


(4) Now observe that this tells us that the order in which the higher derivatives are computed is irrelevant. That is, we can permute the order of our partial derivative computations, and a standard way of doing this is by differentiating first with respect to [math]x_1[/math], as many times as needed, then with respect to [math]x_2[/math], and so on. Thus, the collection of partial derivatives can be written, in a more convenient form, as follows:

[[math]] \frac{d^kf}{dx_1^{k_1}\ldots dx_N^{k_N}}=\frac{d^{k_1}}{dx_1^{k_1}}\cdots\frac{d^{k_N}}{dx_N^{k_N}}(f) [[/math]]


(5) To be more precise, here [math]k\in\mathbb N[/math] is as usual the global order of our derivatives, the exponents [math]k_1,\ldots,k_N\in\mathbb N[/math] are subject to the condition [math]k_1+\ldots+k_N=k[/math], and the operations on the right are the familiar one-variable higher derivative operations.


(6) This being said, for certain tricky questions it is more convenient not to order the indices, or rather to order them according to what order best fits our computation, so what we have in the statement is the good formula, and (4-5) are mere remarks.


(7) And with the remark too that for trivial questions, what we have in the statement is the good formula, simply because there are less indices to be written, when compared to what we have to write when using the ordering procedure in (4-5) above.

All this is very nice, and as an illustration for the above, let us work out the case [math]k=2[/math]. Here things are quite special, and we can formulate the following definition:

Definition

Given a twice differentiable function [math]f:\mathbb R^N\to\mathbb R[/math], we set

[[math]] f''(x)=\left(\frac{d^2f}{dx_idx_j}\right)_{ij} [[/math]]
which is a symmetric matrix, called Hessian matrix of [math]f[/math] at the point [math]x\in\mathbb R^N[/math].

To be more precise, we know that when [math]f:\mathbb R^N\to\mathbb R[/math] is twice differentiable, its order [math]k=2[/math] partial derivatives are the numbers in the statement. Now since these numbers naturally form a [math]N\times N[/math] matrix, the temptation is high to call this matrix [math]f''(x)[/math], and so we will do. And finally, we know from Clairaut that this matrix is symmetric:

[[math]] f''(x)_{ij}=f''(x)_{ji} [[/math]]


Observe that at [math]N=1[/math] this is compatible with our previous definition of the second derivative [math]f''[/math], and this because in this case, the [math]1\times1[/math] matrix from Definition 12.4 is:

[[math]] f''(x)=(f''(x))\in M_{1\times1}(\mathbb R) [[/math]]


As a word of warning, however, never use Definition 12.4 for functions [math]f:\mathbb R^N\to\mathbb R^M[/math], where the second derivative can only be something more complicated. Also, never attempt either to do something similar at [math]k=3[/math] or higher, for functions [math]f:\mathbb R^N\to\mathbb R[/math] with [math]N \gt 1[/math], because again, that beast has too many indices, for being a true, honest matrix.


Back now to business, with these notions, we have the following question to be solved: \begin{question} What is the Taylor formula for a function

[[math]] f:\mathbb R^N\to\mathbb R [[/math]]

and how can this be used for computing the local minima and maxima of [math]f[/math]? \end{question} We will solve this slowly, a bit as we did in the proof of Theorem 12.1, in the [math]N=1[/math] case. Let us start with something that we know well from chapter 10, namely:

[[math]] f(x+t)\simeq f(x)+f'(x)t [[/math]]


To be more precise, we know that this formula holds indeed, with the derivative [math]f'(x)[/math] being by definition the horizontal vector formed by the partial derivatives, and with [math]t\in\mathbb R^N[/math] being regarded as usual as column vector, the formula of [math]f'(x)t[/math] being:

[[math]] \begin{eqnarray*} f'(x)t &=&\left(\frac{df}{dx_1}\ \ldots\ \frac{df}{dx_N}\right) \begin{pmatrix}t_1\\ \vdots\\ t_N\end{pmatrix}\\ &=&\sum_{i=1}^N\frac{df}{dx_i}\,t_i\\ &\in&\mathbb R \end{eqnarray*} [[/math]]


Here we have of course identified the [math]1\times1[/math] matrices with their numeric content. As a consequence, in analogy with what we know in 1 variable, we can formulate:

Theorem

The Taylor formula at order [math]1[/math] for a function [math]f:\mathbb R^N\to\mathbb R[/math] is

[[math]] f(x+t)\simeq f(x)+f'(x)t [[/math]]
and in particular, in order for [math]x[/math] to be a local extremum, we must have [math]f'(x)=0[/math].


Show Proof

Here the first assertion is something that we know, as explained above, and the second assertion follows from it. Indeed, let us look at the order 1 term, given by:

[[math]] f'(x)t=\sum_{i=1}^N\frac{df}{dx_i}\,t_i [[/math]]


Now since this linear combination of the entries of [math]t\in\mathbb R^N[/math] can range among positives and negatives, unless all the coefficients are zero, which means [math]f'(x)=0[/math], we are led to the conclusion that local extremum needs [math]f'(x)=0[/math] to hold, as stated.

Let us discuss now the Taylor formula at order 2. We have here:

Theorem

Given a twice differentiable function [math]f:\mathbb R^N\to\mathbb R[/math], we have

[[math]] f(x+t)\simeq f(x)+f'(x)t+\frac{ \lt f''(x)t,t \gt }{2} [[/math]]
where [math]f''(x)\in M_N(\mathbb R)[/math] stands as usual for the Hessian matrix.


Show Proof

This is something more tricky, the idea being as follows:


(1) As a first observation, at [math]N=1[/math] the Hessian matrix as constructed in Definition 12.4 is the [math]1\times1[/math] matrix having as entry the second derivative [math]f''(x)[/math], and the formula in the statement is something that we know well from Part I, namely:

[[math]] f(x+t)\simeq f(x)+f'(x)t+\frac{f''(x)t^2}{2} [[/math]]


(2) In general now, this is in fact something which does not need a new proof, because it follows from the one-variable formula above, applied to the restriction of [math]f[/math] to the following segment in [math]\mathbb R^N[/math], which can be regarded as being a one-variable interval:

[[math]] I=[x,x+t] [[/math]]


To be more precise, let [math]y\in\mathbb R^N[/math], and consider the following function, with [math]r\in\mathbb R[/math]:

[[math]] g(r)=f(x+ry) [[/math]]


We know from (1) that the Taylor formula for [math]g[/math], at the point [math]r=0[/math], reads:

[[math]] g(r)\simeq g(0)+g'(0)r+\frac{g''(0)r^2}{2} [[/math]]


And our claim is that, with [math]t=ry[/math], this is precisely the formula in the statement.


(3) So, let us see if our claim is correct. By using the chain rule, we have the following formula, with on the right, as usual, a row vector multiplied by a column vector:

[[math]] g'(r)=f'(x+ry)\cdot y [[/math]]


By using again the chain rule, we can compute the second derivative as well:

[[math]] \begin{eqnarray*} g''(r) &=&(f'(x+ry)\cdot y)'\\ &=&\left(\sum_i\frac{df}{dx_i}(x+ry)\cdot y_i\right)'\\ &=&\sum_i\sum_j\frac{d^2f}{dx_idx_j}(x+ry)\cdot\frac{d(x+ry)_j}{dr}\cdot y_i\\ &=&\sum_i\sum_j\frac{d^2f}{dx_idx_j}(x+ry)\cdot y_iy_j\\ &=& \lt f''(x+ry)y,y \gt \end{eqnarray*} [[/math]]


(4) Time now to conclude. We know that we have [math]g(r)=f(x+ry)[/math], and according to our various computations above, we have the following formulae:

[[math]] g(0)=f(x)\quad,\quad g'(0)=f'(x)\quad,\quad g''(0)= \lt f''(x)y,y \gt [[/math]]


Buit with this data in hand, the usual Taylor formula for our one variable function [math]g[/math], at order 2, at the point [math]r=0[/math], takes the following form, with [math]t=ry[/math]:

[[math]] \begin{eqnarray*} f(x+ry) &\simeq&f(x)+f'(x)ry+\frac{ \lt f''(x)y,y \gt r^2}{2}\\ &=&f(x)+f'(x)t+\frac{ \lt f''(x)t,t \gt }{2} \end{eqnarray*} [[/math]]


Thus, we have obtained the formula in the statement.


(5) Finally, for completness, let us record as well a more numeric formulation of what we found. According to our usual rules for matrix calculus, what we found is:

[[math]] f(x+t)\simeq f(x)+\sum_{i=1}^N\frac{df}{dx_i}\,t_i+\frac{1}{2}\sum_{i=1}^N\sum_{j=1}^N\frac{d^2f}{dx_idx_j}\,t_it_j [[/math]]


Observe that, since the Hessian matrix [math]f''(x)[/math] is symmetric, most of the terms on the right will appear in pairs, making it clear what the [math]1/2[/math] is there for, namely avoiding redundancies. However, this is only true for the off-diagonal terms, so instead of further messing up our numeric formula above, we will just leave it like this.

We can now go back to local extrema, and we have, improving Theorem 12.6:

Theorem

In order for a twice differentiable function [math]f:\mathbb R^N\to\mathbb R[/math] to have a local minimum or maximum at [math]x\in\mathbb R^N[/math], the first derivative must vanish there,

[[math]] f'(x)=0 [[/math]]
and the Hessian must be positive or negative, in the sense that the quantities

[[math]] \lt f''(x)t,t \gt \in\mathbb R [[/math]]
must keep a constant sign, positive or negative, when [math]t\in\mathbb R^N[/math] varies.


Show Proof

This is clear from Theorem 12.7. Consider indeed the formula there, namely:

[[math]] f(x+t)\simeq f(x)+f'(x)t+\frac{ \lt f''(x)t,t \gt }{2} [[/math]]


It is clear then that, in order for our function to have a local minimum or maximum at [math]x\in\mathbb R^N[/math], the first derivative must vanish there, [math]f'(x)=0[/math]. Moreover, with this assumption made, the approximation that we have around [math]x[/math] becomes:

[[math]] f(x+t)\simeq f(x)+\frac{ \lt f''(x)t,t \gt }{2} [[/math]]


Thus, we are led to the conclusion in the statement.

As a conclusion to our study so far, our analytic questions lead us into a linear algebra question, regarding the square matrices of type [math]f''(x)\in M_N(\mathbb R)[/math], and more specifically the positivity properties of the following quantities, when [math]t\in\mathbb R^N[/math] varies:

[[math]] \lt f''(x)t,t \gt \in\mathbb R [[/math]]


This is actually a quite subtle question, and many things can be said here, after some linear algebra work. We will be back to this in a moment.


Finally, at higher order things become more complicated, as follows:

Theorem

Given an order [math]k[/math] differentiable function [math]f:\mathbb R^N\to\mathbb R[/math], we have

[[math]] f(x+t)\simeq f(x)+f'(x)t+\frac{ \lt f''(x)t,t \gt }{2}+\ldots [[/math]]
and this helps in identifying the local extrema, when [math]f'(x)=0[/math] and [math]f''(x)=0[/math].


Show Proof

The study here is very similar to that at [math]k=2[/math], from the proof of Theorem 12.7, with everything coming from the usual Taylor formula, applied on:

[[math]] I=[x,x+t] [[/math]]


We will leave this as an instructive exercise, for your long Summer nights.

General references

Banica, Teo (2024). "Calculus and applications". arXiv:2401.00911 [math.CO].