10c. Partial derivatives

This article was automatically generated from a tex file and may contain conversion errors. If permitted, you may login and edit this article to improve the conversion.

Let us discuss now the differentiability in several variables. At order 1, the situation is quite similar to the one in 1 variable, but this time involving matrices. In order to explain this material, let us start with a straightforward definition, as follows:

Definition

We say that a map [math]f:\mathbb R^N\to\mathbb R^M[/math] is differentiable at [math]x\in\mathbb R^N[/math] if

[[math]] f(x+t)\simeq f(x)+f'(x)t [[/math]]

for some linear map [math]f'(x):\mathbb R^N\to\mathbb R^M[/math], called derivative of [math]f[/math] at the point [math]x\in\mathbb R^N[/math].

But is this the correct definition. I can hear you screaming that we are probably going the wrong way, because for functions [math]f:\mathbb R\to\mathbb R[/math] the derivative is something much simpler, as follows, and that we should try to imitate, in our higher dimensional setting:

[[math]] f'(x)=\lim_{t\to0}\frac{f(x+t)-f(x)}{t} [[/math]]

However, this is not possible, for a number of reasons, that are worth discussing in detail. So, here is the discussion, answering all kinds of questions that you might have:

(1) First of all, the above formula does not make any sense for a function [math]f:\mathbb R^N\to\mathbb R^M[/math] with [math]N\neq M[/math], because we cannot divide oranges by apples. And it doesn't make sense either at [math]N=M\in\mathbb N[/math], because, well, here we have [math]\mathbb R^N[/math] oranges, I agree with you, but there is no way of dividing these oranges, unless we are in the special cases [math]N=1,2[/math].

(2) More philosophically know, we have seen that having [math]f'(x)[/math] defined as a number is difficult, but the question is, do we really want to have [math]f'(x)[/math] defined as a number? And my claim here is that, this would be a pity. Think at the case where [math]f:\mathbb R^N\to\mathbb R^M[/math] is linear. Such a map is just “perfect”, and so should equal its own derivative, [math]f=f'[/math].

(3) Summarizing, our Definition 10.18 is just perfection, and is waiting for some further study, and this is what we will do. And in case you're still secretly dreaming about having [math]f'(x)[/math] defined as some sort of number, wait for it. When [math]N=M[/math] at least, there is indeed a lucky number, namely [math]\det(f'(x))[/math], called Jacobian, but more on this later.

Getting back now to Definition 10.18 as formulated, and agreed upon, we have there a linear map [math]f'(x):\mathbb R^N\to\mathbb R^M[/math], waiting to be further understood. So, time now to use our linear algebra knowledge from chapter 9. We know from there that such linear maps correspond to rectangular matrices [math]A\in M_{M\times N}(\mathbb R)[/math], and we are led in this way to: \begin{question} Given a differentiable map [math]f:\mathbb R^N\to\mathbb R^M[/math], in the abstract sense of Definition 10.18, what exactly is its derivative

[[math]] f'(x):\mathbb R^N\to\mathbb R^M [[/math]]

regarded as a rectangular matrix, [math]f'(x)\in M_{M\times N}(\mathbb R)[/math]? \end{question} Again, I might hear scream you here, arguing that you come after a long battle, just agreeing that the derivative is a linear map, and not a number, and now what, we are trying to replace this linear map by a matrix, and so by a bunch of numbers.

Good point, and I must admit that I have no good answer to this. In fact, what we are doing here, namely Definition 10.18, then Question 10.19, and finally Theorem 10.20 to follow in a moment, are quite deep things, that took mankind several centuries to develop, and that we are now presenting in a compressed form. So yes, all this is difficult mathematics, when you first see it, I perfectly agree with you.

In any case, hope that you're still with me, and in order to further clarify all this, here is the answer to Question 10.19:

Theorem

The derivative of a differentiable function [math]f:\mathbb R^N\to\mathbb R^M[/math], making the approximation formula

[[math]] f(x+t)\simeq f(x)+f'(x)t [[/math]]

work, is the matrix of partial derivatives at [math]x[/math], namely

[[math]] f'(x)=\left(\frac{df_i}{dx_j}(x)\right)_{ij}\in M_{M\times N}(\mathbb R) [[/math]]

acting on the vectors [math]t\in\mathbb R^N[/math] by usual multiplication.

Show Proof

As a first observation, the formula in the statement makes sense indeed, as an equality, or rather approximation, of vectors in [math]\mathbb R^M[/math], as follows:

[[math]] f\begin{pmatrix}x_1+t_1\\ \vdots\\ x_N+t_N\end{pmatrix} \simeq f\begin{pmatrix}x_1\\ \vdots\\ x_N\end{pmatrix} +\begin{pmatrix} \frac{df_1}{dx_1}(x)&\ldots&\frac{df_1}{dx_N}(x)\\ \vdots&&\vdots\\ \frac{df_M}{dx_1}(x)&\ldots&\frac{df_M}{dx_N}(x) \end{pmatrix}\begin{pmatrix}t_1\\ \vdots\\ t_N\end{pmatrix} [[/math]]

In order to prove now this formula, which does make sense, the idea is as follows:

(1) First of all, at [math]N=M=1[/math] what we have is a usual 1-variable function [math]f:\mathbb R\to\mathbb R[/math], and the formula in the statement is something that we know well, namely:

[[math]] f(x+t)\simeq f(x)+f'(x)t [[/math]]

(2) Let us discuss now the case [math]N=2,M=1[/math]. Here what we have is a function [math]f:\mathbb R^2\to\mathbb R[/math], and by using twice the basic approximation result from (1), we obtain:

[[math]] \begin{eqnarray*} f\binom{x_1+t_1}{x_2+t_2} &\simeq&f\binom{x_1+t_1}{x_2}+\frac{df}{dx_2}(x)t_2\\ &\simeq&f\binom{x_1}{x_2}+\frac{df}{dx_1}(x)t_1+\frac{df}{dx_2}(x)t_2\\ &=&f\binom{x_1}{x_2}+\begin{pmatrix}\frac{df}{dx_1}(x)&\frac{df}{dx_2}(x)\end{pmatrix}\binom{t_1}{t_2} \end{eqnarray*} [[/math]]

(3) More generally, we can deal in this way with the general case [math]M=1[/math], with the formula here, obtained via a straightforward recurrence, being as follows:

[[math]] \begin{eqnarray*} f\begin{pmatrix}x_1+t_1\\ \vdots\\ x_N+t_N\end{pmatrix} &\simeq&f\begin{pmatrix}x_1\\ \vdots\\ x_N\end{pmatrix}+\frac{df}{dx_1}(x)t_1+\ldots+\frac{df}{dx_N}(x)t_N\\ &=&f\begin{pmatrix}x_1\\ \vdots\\ x_N\end{pmatrix}+ \begin{pmatrix}\frac{df}{dx_1}(x)&\ldots&\frac{df}{dx_N}(x)\end{pmatrix} \begin{pmatrix}t_1\\ \vdots\\ t_N\end{pmatrix} \end{eqnarray*} [[/math]]

(4) But this gives the result in the case where both [math]N,M\in\mathbb N[/math] are arbitrary too. Indeed, consider a function [math]f:\mathbb R^N\to\mathbb R^M[/math], and let us write it as follows:

[[math]] f=\begin{pmatrix}f_1\\ \vdots\\ f_M\end{pmatrix} [[/math]]

We can apply (3) to each of the components [math]f_i:\mathbb R^N\to\mathbb R[/math], and we get:

[[math]] f_i\begin{pmatrix}x_1+t_1\\ \vdots\\ x_N+t_N\end{pmatrix} \simeq f_i\begin{pmatrix}x_1\\ \vdots\\ x_N\end{pmatrix}+ \begin{pmatrix}\frac{df_i}{dx_1}(x)&\ldots&\frac{df_i}{dx_N}(x)\end{pmatrix} \begin{pmatrix}t_1\\ \vdots\\ t_N\end{pmatrix} [[/math]]

But this collection of [math]M[/math] formulae tells us precisely that the following happens, as an equality, or rather approximation, of vectors in [math]\mathbb R^M[/math]:

[[math]] f\begin{pmatrix}x_1+t_1\\ \vdots\\ x_N+t_N\end{pmatrix} \simeq f\begin{pmatrix}x_1\\ \vdots\\ x_N\end{pmatrix} +\begin{pmatrix} \frac{df_1}{dx_1}(x)&\ldots&\frac{df_1}{dx_N}(x)\\ \vdots&&\vdots\\ \frac{df_M}{dx_1}(x)&\ldots&\frac{df_M}{dx_N}(x) \end{pmatrix}\begin{pmatrix}t_1\\ \vdots\\ t_N\end{pmatrix} [[/math]]

Thus, we are led to the conclusion in the statement.

■

The above result, while being very nice, clear and useful, does not close the discussion regarding the derivative. Indeed, conversely, we would like to know if the existence of partial derivatives guarantees the fact that [math]f[/math] is differentiable, with derivative [math]f'(x)[/math] appearing as the rectangular matrix fomed by the partial derivatives at [math]x[/math].

So, let us discuss now this remaining theoretical question. The result here, which is something quite technical, but which can be useful in practice, is as follows:

Theorem

For a function [math]f:X\to\mathbb R^M[/math], with [math]X\subset\mathbb R^N[/math], the following conditions are equivalent, and in this case we say that [math]f[/math] is continuously differentiable:

[math]f[/math] is differentiable, and the map [math]x\to f'(x)[/math] is continuous.
[math]f[/math] has partial derivatives, which are continuous with respect to [math]x\in X[/math].

If these conditions are satisfied, [math]f'(x)[/math] is the matrix fomed by the partial derivatives at [math]x[/math].

Show Proof

We already know, from Theorem 10.20, that the last assertion holds. Regarding now the proof of the equivalence, this goes as follows:

[math](1)\implies(2)[/math] Assuming that [math]f[/math] is differentiable, we know from Theorem 10.20 that [math]f'(x)[/math] is the matrix fomed by the partial derivatives at [math]x[/math]. Thus, for any [math]x,y\in X[/math]:

[[math]] \frac{df_i}{dx_j}(x)-\frac{df_i}{dx_j}(y)=f'(x)_{ij}-f'(y)_{ij} [[/math]]

By applying now the absolute value, we obtain from this the following estimate:

[[math]] \begin{eqnarray*} \left|\frac{df_i}{dx_j}(x)-\frac{df_i}{dx_j}(y)\right| &=&\left|f'(x)_{ij}-f'(y)_{ij}\right|\\ &=&\left|(f'(x)-f'(y))_{ij}\right|\\ &\leq&||f'(x)-f'(y)|| \end{eqnarray*} [[/math]]

But this gives the result, because if the map [math]x\to f'(x)[/math] is assumed to be continuous, then the partial derivatives follow to be continuous with respect to [math]x\in X[/math].

[math](2)\implies(1)[/math] This is something more technical. For simplicity, let us assume [math]M=1[/math], the proof in general being similar. Given [math]x\in X[/math] and [math]\varepsilon \gt 0[/math], let us pick [math]r \gt 0[/math] such that the ball [math]B=B_x(r)[/math] belongs to [math]X[/math], and such that the following happens, over [math]B[/math]:

[[math]] \left|\frac{df}{dx_j}(x)-\frac{df}{dx_j}(y)\right| \lt \frac{\varepsilon}{N} [[/math]]

Our claim is that, with this choice made, we have the following estimate, for any [math]t\in\mathbb R^N[/math] satisfying [math]||t|| \lt r[/math], with [math]A[/math] being the vector of partial derivatives at [math]x[/math]:

[[math]] |f(x+t)-f(x)-At|\leq\varepsilon||t|| [[/math]]

In order to prove this claim, the idea will be that of suitably applying the mean value theorem, over the [math]N[/math] directions of [math]\mathbb R^N[/math]. Indeed, consider the following vectors:

[[math]] t^{(k)}=\begin{pmatrix} t_1\\ \vdots\\ t_k\\ 0\\ \vdots\\ 0 \end{pmatrix} [[/math]]

In terms of these vectors, we have the following formula:

[[math]] f(x+t)-f(x)=\sum_{j=1}^Nf(x+t^{(j)})-f(x+t^{(j-1)}) [[/math]]

Also, the mean value theorem gives a formula as follows, with [math]s_j\in[0,1][/math]:

[[math]] f(x+t^{(j)})-f(x+t^{(j-1)})=\frac{df}{dx_j}(x+s_jt^{(j)}+(1-s_j)t^{(j-1)})\cdot t_j [[/math]]

But, according to our assumption on [math]r \gt 0[/math] from the beginning, the derivative on the right differs from [math]\frac{df}{dx_j}(x)[/math] by something which is smaller than [math]\varepsilon/N[/math]:

[[math]] \left|\frac{df}{dx_j}(x+s_jt^{(j)}+(1-s_j)t^{(j-1)})-\frac{df}{dx_j}(x)\right| \lt \frac{\varepsilon}{N} [[/math]]

Now by putting everything together, we obtain the following estimate:

[[math]] \begin{eqnarray*} |f(x+t)-f(x)-At| &=&\left|\sum_{j=1}^Nf(x+t^{(j)})-f(x+t^{(j-1)})-\frac{df}{dx_j}(x)\cdot t_j\right|\\ &\leq&\sum_{j=1}^N\left|f(x+t^{(j)})-f(x+t^{(j-1)})-\frac{df}{dx_j}(x)\cdot t_j\right|\\ &=&\sum_{j=1}^N\left|\frac{df}{dx_j}(x+s_jt^{(j)}+(1-s_j)t^{(j-1)})\cdot t_j-\frac{df}{dx_j}(x)\cdot t_j\right|\\ &=&\sum_{j=1}^N\left|\frac{df}{dx_j}(x+s_jt^{(j)}+(1-s_j)t^{(j-1)})-\frac{df}{dx_j}(x)\right|\cdot|t_j|\\ &\leq&\sum_{j=1}^N\frac{\varepsilon}{N}\cdot|t_j|\\ &\leq&\varepsilon||t|| \end{eqnarray*} [[/math]]

Thus we have proved our claim, and this gives the result.

■

This was for the basic theory of partial derivatives. In practice, there are far more things that can be said, both at the abstract and the concrete level, including of course many examples. We will be back to this, after developing some more general theory.

Before getting into this, however, let us formulate a definition that you will certainly appreciate, bringing a bit of humanity, and more specifically a good old real number, in this world of vectors, matrices and other beasts which is multivariable calculus:

Definition

Given a differentiable function [math]f:\mathbb R^N\to\mathbb R^N[/math], its Jacobian at [math]x\in\mathbb R^N[/math] is the number

[[math]] \det(f'(x))\in\mathbb R [[/math]]

measuring the infinitesimal rate of the volume inflation by [math]f[/math], at the point [math]x[/math].

Here the first part is standard, because when [math]N=M[/math], as above, the derivative is a linear map [math]f'(x):\mathbb R^N\to\mathbb R^N[/math], which is the same as a square matrix [math]f'(x)\in M_N(\mathbb R)[/math], and so we can consider the determinant of this matrix, [math]\det(f'(x))\in\mathbb R[/math]. As for the second part, this comes from our fine knowledge of the determinant, from chapter 9.

All this is very nice, and as a first observation, according to our formula of [math]f'(x)[/math] as being the matrix formed by the partial derivatives, we have:

[[math]] \det(f'(x))=\begin{vmatrix} \frac{df_1}{dx_1}(x)&\ldots&\frac{df_1}{dx_N}(x)\\ \vdots&&\vdots\\ \frac{df_N}{dx_1}(x)&\ldots&\frac{df_N}{dx_N}(x) \end{vmatrix} [[/math]]

Thus, the Jacobian can be explicitly computed. However, in what regards the practical uses of the Jacobian, these are quite complicated, and this will have to wait a bit, until chapter 13 below. So, sorry for this, not yet time to enjoy Definition 10.22, and stay with me, plenty of further linear algebra, and matrices instead of numbers, to follow.

General references

Banica, Teo (2024). "Calculus and applications". arXiv:2401.00911 [math.CO].