Derivatives

This article was automatically generated from a tex file and may contain conversion errors. If permitted, you may login and edit this article to improve the conversion.

3a. Derivatives, rules

Welcome to calculus. In this chapter we go for the real thing, namely development of modern calculus, following some amazing ideas of Newton, Leibnitz and others. The material will be quite difficult, mixing geometry and intuition with formal mathematics and computations, and needing some time to be understood. But we will survive.

The basic idea of calculus is very simple. We are interested in functions [math]f:\mathbb R\to\mathbb R[/math], and we already know that when [math]f[/math] is continuous at a point [math]x[/math], we can write an approximation formula as follows, for the values of our function [math]f[/math] around that point [math]x[/math]:

[[math]] f(x+t)\simeq f(x) [[/math]]

The problem is now, how to improve this? And a bit of thinking at all this suggests to look at the slope of [math]f[/math] at the point [math]x[/math]. Which leads us into the following notion:

Definition

A function [math]f:\mathbb R\to\mathbb R[/math] is called differentiable at [math]x[/math] when

[[math]] f'(x)=\lim_{t\to0}\frac{f(x+t)-f(x)}{t} [[/math]]

called derivative of [math]f[/math] at that point [math]x[/math], exists.

As a first remark, in order for [math]f[/math] to be differentiable at [math]x[/math], that is to say, in order for the above limit to converge, the numerator must go to [math]0[/math], as the denominator [math]t[/math] does:

[[math]] \lim_{t\to0}\left[f(x+t)-f(x)\right]=0 [[/math]]

Thus, [math]f[/math] must be continuous at [math]x[/math]. However, the converse is not true, a basic counterexample being [math]f(x)=|x|[/math] at [math]x=0[/math]. Let us summarize these findings as follows:

Proposition

If [math]f[/math] is differentiable at [math]x[/math], then [math]f[/math] must be continuous at [math]x[/math]. However, the converse is not true, a basic counterexample being [math]f(x)=|x|[/math], at [math]x=0[/math].

Show Proof

The first assertion is something that we already know, from the above. As for the second assertion, regarding [math]f(x)=|x|[/math], this is something quite clear on the picture of [math]f[/math], but let us prove this mathematically, based on Definition 3.1. We have:

[[math]] \lim_{t\searrow 0}\frac{|0+t|-|0|}{t}=\lim_{t\searrow 0}\frac{t-0}{t}=1 [[/math]]

On the other hand, we have as well the following computation:

[[math]] \lim_{t\nearrow 0}\frac{|0+t|-|0|}{t}=\lim_{t\nearrow 0}\frac{-t-0}{t}=-1 [[/math]]

Thus, the limit in Definition 3.1 does not converge, so we have our counterexample.

■

Generally speaking, the last assertion in Proposition 3.2 should not bother us much, because most of the basic continuous functions are differentiable, and we will see examples in a moment. Before that, however, let us recall why we are here, namely improving the basic estimate [math]f(x+t)\simeq f(x)[/math]. We can now do this, using the derivative, as follows:

Theorem

Assuming that [math]f[/math] is differentiable at [math]x[/math], we have:

[[math]] f(x+t)\simeq f(x)+f'(x)t [[/math]]

In other words, [math]f[/math] is, approximately, locally affine at [math]x[/math].

Show Proof

Assume indeed that [math]f[/math] is differentiable at [math]x[/math], and let us set, as before:

[[math]] f'(x)=\lim_{t\to0}\frac{f(x+t)-f(x)}{t} [[/math]]

By multiplying by [math]t[/math], we obtain that we have, once again in the [math]t\to0[/math] limit:

[[math]] f(x+t)-f(x)\simeq f'(x)t [[/math]]

Thus, we are led to the conclusion in the statement.

■

All this is very nice, and before developing more theory, let us work out some examples. As a first illustration, the derivatives of the power functions are as follows:

Proposition

We have the differentiation formula

[[math]] (x^p)'=px^{p-1} [[/math]]

valid for any exponent [math]p\in\mathbb R[/math].

Show Proof

We can do this in three steps, as follows:

(1) In the case [math]p\in\mathbb N[/math] we can use the binomial formula, which gives, as desired:

[[math]] \begin{eqnarray*} (x+t)^p &=&\sum_{k=0}^n\binom{p}{k}x^{p-k}t^k\\ &=&x^p+px^{p-1}t+\ldots+t^p\\ &\simeq&x^p+px^{p-1}t \end{eqnarray*} [[/math]]

(2) Let us discuss now the general case [math]p\in\mathbb Q[/math]. We write [math]p=m/n[/math], with [math]m\in\mathbb Z[/math] and [math]n\in\mathbb N[/math]. In order to do the computation, we use the following formula:

[[math]] a^n-b^n=(a-b)(a^{n-1}+a^{n-2}b+\ldots+b^{n-1}) [[/math]]

We set in this formula [math]a=(x+t)^{m/n}[/math] and [math]b=x^{m/n}[/math]. We obtain, as desired:

[[math]] \begin{eqnarray*} (x+t)^{m/n}-x^{m/n} &=&\frac{(x+t)^m-x^m}{(x+t)^{m(n-1)/n}+\ldots+x^{m(n-1)/n}}\\ &\simeq&\frac{(x+t)^m-x^m}{nx^{m(n-1)/n}}\\ &\simeq&\frac{mx^{m-1}t}{nx^{m(n-1)/n}}\\ &=&\frac{m}{n}\cdot x^{m-1-m+m/n}\cdot t\\ &=&\frac{m}{n}\cdot x^{m/n-1}\cdot t \end{eqnarray*} [[/math]]

(3) In the general case now, where [math]p\in\mathbb R[/math] is real, we can use a similar argument. Indeed, given any integer [math]n\in\mathbb N[/math], we have the following computation:

[[math]] \begin{eqnarray*} (x+t)^p-x^p &=&\frac{(x+t)^{pn}-x^{pn}}{(x+t)^{p(n-1)}+\ldots+x^{p(n-1)}}\\ &\simeq&\frac{(x+t)^{pn}-x^{pn}}{nx^{p(n-1)}} \end{eqnarray*} [[/math]]

Now observe that we have the following estimate, with [math][.][/math] being the integer part:

[[math]] (x+t)^{[pn]}\leq (x+t)^{pn}\leq (x+t)^{[pn]+1} [[/math]]

By using the binomial formula on both sides, for the integer exponents [math][pn][/math] and [math][pn]+1[/math] there, we deduce that with [math]n \gt \gt 0[/math] we have the following estimate:

[[math]] (x+t)^{pn}\simeq x^{pn}+pnx^{pn-1}t [[/math]]

Thus, we can finish our computation started above as follows:

[[math]] (x+t)^p-x^p \simeq\frac{pnx^{pn-1}t}{nx^{pn-p}} =px^{p-1}t [[/math]]

But this gives [math](x^p)'=px^{p-1}[/math], which finishes the proof.

■

Here are some further computations, for other basic functions that we know:

Proposition

We have the following results:

[math](\sin x)'=\cos x[/math].
[math](\cos x)'=-\sin x[/math].
[math](e^x)'=e^x[/math].
[math](\log x)'=x^{-1}[/math].

Show Proof

This is quite tricky, as always when computing derivatives, as follows:

(1) Regarding [math]\sin[/math], the computation here goes as follows:

[[math]] \begin{eqnarray*} (\sin x)' &=&\lim_{t\to0}\frac{\sin(x+t)-\sin x}{t}\\ &=&\lim_{t\to0}\frac{\sin x\cos t+\cos x\sin t-\sin x}{t}\\ &=&\lim_{t\to0}\sin x\cdot\frac{\cos t-1}{t}+\cos x\cdot\frac{\sin t}{t}\\ &=&\cos x \end{eqnarray*} [[/math]]

Here we have used the fact, which is clear on pictures, by drawing the trigonometric circle, that we have [math]\sin t\simeq t[/math] for [math]t\simeq 0[/math], plus the fact, which follows from this and from Pythagoras, [math]\sin^2+\cos^2=1[/math], that we have as well [math]\cos t\simeq 1-t^2/2[/math], for [math]t\simeq 0[/math].

(2) The computation for [math]\cos[/math] is similar, as follows:

[[math]] \begin{eqnarray*} (\cos x)' &=&\lim_{t\to0}\frac{\cos(x+t)-\cos x}{t}\\ &=&\lim_{t\to0}\frac{\cos x\cos t-\sin x\sin t-\cos x}{t}\\ &=&\lim_{t\to0}\cos x\cdot\frac{\cos t-1}{t}-\sin x\cdot\frac{\sin t}{t}\\ &=&-\sin x \end{eqnarray*} [[/math]]

(3) For the exponential, the derivative can be computed as follows:

[[math]] \begin{eqnarray*} (e^x)' &=&\left(\sum_{k=0}^\infty\frac{x^k}{k!}\right)'\\ &=&\sum_{k=0}^\infty\frac{kx^{k-1}}{k!}\\ &=&e^x \end{eqnarray*} [[/math]]

(4) As for the logarithm, the computation here is as follows, using [math]\log(1+y)\simeq y[/math] for [math]y\simeq 0[/math], which follows from [math]e^y\simeq 1+y[/math] that we found in (3), by taking the logarithm:

[[math]] \begin{eqnarray*} (\log x)' &=&\lim_{t\to0}\frac{\log(x+t)-\log x}{t}\\ &=&\lim_{t\to0}\frac{\log(1+t/x)}{t}\\ &=&\frac{1}{x} \end{eqnarray*} [[/math]]

Thus, we are led to the formulae in the statement.

■

Speaking exponentials, we can now formulate a nice result about them:

Theorem

The exponential function, namely

[[math]] e^x=\sum_{k=0}^\infty\frac{x^k}{k!} [[/math]]

is the unique power series satisfying [math]f'=f[/math] and [math]f(0)=1[/math].

Show Proof

Consider indeed a power series satisfying [math]f'=f[/math] and [math]f(0)=1[/math]. Due to [math]f(0)=1[/math], the first term must be 1, and so our function must look as follows:

[[math]] f(x)=1+\sum_{k=1}^\infty c_kx^k [[/math]]

According to our differentiation rules, the derivative of this series is given by:

[[math]] f(x)=\sum_{k=1}^\infty kc_kx^{k-1} [[/math]]

Thus, the equation [math]f'=f[/math] is equivalent to the following equalities:

[[math]] c_1=1\quad,\quad 2c_2=c_1\quad,\quad 3c_3=c_2\quad,\quad 4c_4=c_3\quad,\quad\ldots [[/math]]

But this system of equations can be solved by recurrence, as follows:

[[math]] c_1=1\quad,\quad c_2=\frac{1}{2}\quad,\quad c_3=\frac{1}{2\times 3}\quad,\quad c_4=\frac{1}{2\times 3\times 4}\quad,\quad\ldots [[/math]]

Thus we have [math]c_k=1/k![/math], leading to the conclusion in the statement.

■

Observe that the above result leads to a more conceptual explanation for the number [math]e[/math] itself. To be more precise, [math]e\in\mathbb R[/math] is the unique number satisfying:

[[math]] (e^x)'=e^x [[/math]]

Let us work out now some general results. We have here the following statement:

Theorem

We have the following formulae:

[math](f+g)'=f'+g'[/math].
[math](fg)'=f'g+fg'[/math].
[math](f\circ g)'=(f'\circ g)\cdot g'[/math].

Show Proof

All these formulae are elementary, the idea being as follows:

(1) This follows indeed from definitions, the computation being as follows:

[[math]] \begin{eqnarray*} (f+g)'(x) &=&\lim_{t\to0}\frac{(f+g)(x+t)-(f+g)(x)}{t}\\ &=&\lim_{t\to0}\left(\frac{f(x+t)-f(x)}{t}+\frac{g(x+t)-g(x)}{t}\right)\\ &=&\lim_{t\to0}\frac{f(x+t)-f(x)}{t}+\lim_{t\to0}\frac{g(x+t)-g(x)}{t}\\ &=&f'(x)+g'(x) \end{eqnarray*} [[/math]]

(2) This follows from definitions too, the computation, by using the more convenient formula [math]f(x+t)\simeq f(x)+f'(x)t[/math] as a definition for the derivative, being as follows:

[[math]] \begin{eqnarray*} (fg)(x+t) &=&f(x+t)g(x+t)\\ &\simeq&(f(x)+f'(x)t)(g(x)+g'(x)t)\\ &\simeq&f(x)g(x)+(f'(x)g(x)+f(x)g'(x))t \end{eqnarray*} [[/math]]

Indeed, we obtain from this that the derivative is the coefficient of [math]t[/math], namely:

[[math]] (fg)'(x)=f'(x)g(x)+f(x)g'(x) [[/math]]

(3) Regarding compositions, the computation here is as follows, again by using the more convenient formula [math]f(x+t)\simeq f(x)+f'(x)t[/math] as a definition for the derivative:

[[math]] \begin{eqnarray*} (f\circ g)(x+t) &=&f(g(x+t))\\ &\simeq&f(g(x)+g'(x)t)\\ &\simeq&f(g(x))+f'(g(x))g'(x)t \end{eqnarray*} [[/math]]

Indeed, we obtain from this that the derivative is the coefficient of [math]t[/math], namely:

[[math]] (f\circ g)'(x)=f'(g(x))g'(x) [[/math]]

Thus, we are led to the conclusions in the statement.

■

We can of course combine the above formulae, and we obtain for instance:

Proposition

The derivatives of fractions are given by:

[[math]] \left(\frac{f}{g}\right)'=\frac{f'g-fg'}{g^2} [[/math]]

In particular, we have the following formula, for the derivative of inverses:

[[math]] \left(\frac{1}{f}\right)'=-\frac{f'}{f^2} [[/math]]

In fact, we have [math](f^p)'=pf^{p-1}[/math], for any exponent [math]p\in\mathbb R[/math].

Show Proof

This statement is written a bit upside down, and for the proof it is better to proceed backwards. To be more precise, by using [math](x^p)'=px^{p-1}[/math] and Theorem 3.7 (3), we obtain the third formula. Then, with [math]p=-1[/math], we obtain from this the second formula. And finally, by using this second formula and Theorem 3.7 (2), we obtain:

[[math]] \begin{eqnarray*} \left(\frac{f}{g}\right)' &=&\left(f\cdot\frac{1}{g}\right)'\\ &=&f'\cdot\frac{1}{g}+f\left(\frac{1}{g}\right)'\\ &=&\frac{f'}{g}-\frac{fg'}{g^2}\\ &=&\frac{f'g-fg'}{g^2} \end{eqnarray*} [[/math]]

Thus, we are led to the formulae in the statement.

■

All the above might seem to start to be a bit too complex, with too many things to be memorized and so on, and as a piece of advice here, we have: \begin{advice} Memorize and cherish the formula for fractions

[[math]] \left(\frac{f}{g}\right)'=\frac{f'g-fg'}{g^2} [[/math]]

along with the usual addition formula, that you know well

[[math]] \frac{a}{b}+\frac{c}{d}=\frac{ad+bc}{bd} [[/math]]

and generally speaking, never mess with fractions. \end{advice} With this coming from a lifelong calculus teacher and scientist, mathematics can be difficult, and many things can be pardoned, but not messing with fractions. And with this going beyond mathematics too, say if you want to make a living by selling apples or tomatoes at the market, fine, but you'll need to know well fractions, trust me.

Back to work now, with the above formulae in hand, we can do all sorts of computations for other basic functions that we know, including [math]\tan x[/math], or [math]\arctan x[/math]:

Proposition

We have the following formulae,

[[math]] (\tan x)'=\frac{1}{\cos^2x}\quad,\quad (\arctan x)'=\frac{1}{1+x^2} [[/math]]

and the derivatives of the remaining trigonometric functions can be computed as well.

Show Proof

For [math]\tan[/math], we have the following computation:

[[math]] \begin{eqnarray*} (\tan x)' &=&\left(\frac{\sin x}{\cos x}\right)'\\ &=&\frac{\sin'x\cos x-\sin x\cos'x}{\cos^2x}\\ &=&\frac{\cos^2x+\sin^2x}{\cos^2x}\\ &=&\frac{1}{\cos^2x} \end{eqnarray*} [[/math]]

As for [math]\arctan[/math], we can use here the following computation:

[[math]] \begin{eqnarray*} (\tan\circ\arctan)'(x) &=&\tan'(\arctan x)\arctan'(x)\\ &=&\frac{1}{\cos^2(\arctan x)}\arctan'(x) \end{eqnarray*} [[/math]]

Indeed, since the term on the left is simply [math]x'=1[/math], we obtain from this:

[[math]] \arctan'(x)=\cos^2(\arctan x) [[/math]]

On the other hand, with [math]t=\arctan x[/math] we know that we have [math]\tan t=x[/math], and so:

[[math]] \cos^2(\arctan x)=\cos^2t=\frac{1}{1+\tan^2t}=\frac{1}{1+x^2} [[/math]]

Thus, we are led to the formula in the statement, namely:

[[math]] (\arctan x)'=\frac{1}{1+x^2} [[/math]]

As for the last assertion, we will leave this as an exercise.

■

At the theoretical level now, further building on Theorem 3.3, we have:

Theorem

The local minima and maxima of a differentiable function [math]f:\mathbb R\to\mathbb R[/math] appear at the points [math]x\in\mathbb R[/math] where:

[[math]] f'(x)=0 [[/math]]

However, the converse of this fact is not true in general.

Show Proof

The first assertion follows from the formula in Theorem 3.3, namely:

[[math]] f(x+t)\simeq f(x)+f'(x)t [[/math]]

Indeed, let us rewrite this formula, more conveniently, in the following way:

[[math]] f(x+t)-f(x)\simeq f'(x)t [[/math]]

Now saying that our function [math]f[/math] has a local maximum at [math]x\in\mathbb R[/math] means that there exists a number [math]\varepsilon \gt 0[/math] such that the following happens:

[[math]] f(x+t)\geq f(x)\quad,\quad\forall t\in[-\varepsilon,\varepsilon] [[/math]]

We conclude that we must have [math]f'(x)t\geq0[/math] for sufficiently small [math]t[/math], and since this small [math]t[/math] can be both positive or negative, this gives, as desired:

[[math]] f'(x)=0 [[/math]]

Similarly, saying that our function [math]f[/math] has a local minimum at [math]x\in\mathbb R[/math] means that there exists a number [math]\varepsilon \gt 0[/math] such that the following happens:

[[math]] f(x+t)\leq f(x)\quad,\quad\forall t\in[-\varepsilon,\varepsilon] [[/math]]

Thus [math]f'(x)t\leq0[/math] for small [math]t[/math], and this gives, as before, [math]f'(x)=0[/math]. Finally, in what regards the converse, the simplest counterexample here is the following function:

[[math]] f(x)=x^3 [[/math]]

Indeed, we have [math]f'(x)=3x^2[/math], and in particular [math]f'(0)=0[/math]. But our function being clearly increasing, [math]x=0[/math] is not a local maximum, nor a local minimum.

■

As an important consequence of Theorem 3.11, we have:

Theorem

Assuming that [math]f:[a,b]\to\mathbb R[/math] is differentiable, we have

[[math]] \frac{f(b)-f(a)}{b-a}=f'(c) [[/math]]

for some [math]c\in(a,b)[/math], called mean value property of [math]f[/math].

Show Proof

In the case [math]f(a)=f(b)[/math], the result, called Rolle theorem, states that we have [math]f'(c)=0[/math] for some [math]c\in(a,b)[/math], and follows from Theorem 3.11. Now in what regards our statement, due to Lagrange, this follows from Rolle, applied to the following function:

[[math]] g(x)=f(x)-\frac{f(b)-f(a)}{b-a}\cdot x [[/math]]

Indeed, we have [math]g(a)=g(b)[/math], due to our choice of the constant on the right, so we get [math]g'(c)=0[/math] for some [math]c\in(a,b)[/math], which translates into the formula in the statement.

■

In practice, Theorem 3.11 can be used in order to find the maximum and minimum of any differentiable function, and this method is best recalled as follows: In order to find the minimum and maximum of [math]f:[a,b]\to\mathbb R[/math]:

Compute the derivative [math]f'[/math].
Solve the equation [math]f'(x)=0[/math].
Add [math]a,b[/math] to your set of solutions.
Compute [math]f(x)[/math], for all your solutions.
Compute the min/max of all these [math]f(x)[/math] values.
Then this is the min/max of your function.

To be more precise, we are using here Theorem 3.11, or rather the obvious extension of this result to the case of the functions [math]f:[a,b]\to\mathbb R[/math]. This tells us that the local minima and maxima of our function [math]f[/math], and in particular the global minima and maxima, can be found among the zeroes of the first derivative [math]f'[/math], with the endpoints [math]a,b[/math] added. Thus, what we have to do is to compute these “candidates”, as explained in steps (1-2-3), and then see what each candidate is exactly worth, as explained in steps (4-5-6).

Needless to say, all this is very interesting, and powerful. The general problem in any type of applied mathematics is that of finding the minimum or maximum of some function, and we have now an algorithm for dealing with such questions. Very nice.

3b. Second derivatives

The derivative theory that we have is already quite powerful, and can be used in order to solve all sorts of interesting questions, but with a bit more effort, we can do better. Indeed, at a more advanced level, we can come up with the following notion:

Definition

We say that [math]f:\mathbb R\to\mathbb R[/math] is twice differentiable if it is differentiable, and its derivative [math]f':\mathbb R\to\mathbb R[/math] is differentiable too. The derivative of [math]f'[/math] is denoted

[[math]] f'':\mathbb R\to\mathbb R [[/math]]

and is called second derivative of [math]f[/math].

You might probably wonder why coming with this definition, which looks a bit abstract and complicated, instead of further developing the theory of the first derivative, which looks like something very reasonable and useful. Good point, and answer to this coming in a moment. But before that, let us get a bit familiar with [math]f''[/math]. We first have: \begin{interpretation} The second derivative [math]f''(x)\in\mathbb R[/math] is the number which:

Expresses the growth rate of the slope [math]f'(z)[/math] at the point [math]x[/math].
Gives us the acceleration of the function [math]f[/math] at the point [math]x[/math].
Computes how much different is [math]f(x)[/math], compared to [math]f(z)[/math] with [math]z\simeq x[/math].
Tells us how much convex or concave is [math]f[/math], around the point [math]x[/math].

\end{interpretation} So, this is the truth about the second derivative, making it clear that what we have here is a very interesting notion. In practice now, (1) follows from the usual interpretation of the derivative, as both a growth rate, and a slope. Regarding (2), this is some sort of reformulation of (1), using the intuitive meaning of the word “acceleration”, with the relevant physics equations, due to Newton, being as follows:

[[math]] v=\dot{x}\quad,\quad a=\dot{v} [[/math]]

To be more precise, here [math]x,v,a[/math] are the position, speed and acceleration, and the dot denotes the time derivative, and according to these equations, we have [math]a=\ddot{x}[/math], second derivative. We will be back to these equations at the end of the present chapter.

Regarding now (3) in the above, this is something more subtle, of statistical nature, that we will clarify with some mathematics, in a moment. As for (4), this is something quite subtle too, that we will again clarify with some mathematics, in a moment.

All in all, what we have above is a mixture of trivial and non-trivial facts, and do not worry, we will get familiar with all this, in the next few pages.

In practice now, let us first compute the second derivatives of the functions that we are familiar with, see what we get. The result here, which is perhaps not very enlightening at this stage of things, but which certainly looks technically useful, is as follows:

Proposition

The second derivatives of the basic functions are as follows:

[math](x^p)''=p(p-1)x^{p-2}[/math].
[math]\sin''=-\sin[/math].
[math]\cos''=-\cos[/math].
[math]\exp'=\exp[/math].
[math]\log'(x)=-1/x^2[/math].

Also, there are functions which are differentiable, but not twice differentiable.

Show Proof

We have several assertions here, the idea being as follows:

(1) Regarding the various formulae in the statement, these all follow from the various formulae for the derivatives established before, as follows:

[[math]] (x^p)''=(px^{p-1})'=p(p-1)x^{p-2} [[/math]]

[[math]] (\sin x)''=(\cos x)'=-\sin x [[/math]]

[[math]] (\cos x)''=(-\sin x)'=-\cos x [[/math]]

[[math]] (e^x)''=(e^x)'=e^x [[/math]]

[[math]] (\log x)''=(-1/x)'=-1/x^2 [[/math]]

Of course, this is not the end of the story, because these formulae remain quite opaque, and must be examined in view of Interpretation 3.15, in order to see what exactly is going on. Also, we have [math]\tan[/math] and the inverse trigonometric functions too. In short, plenty of good exercises here, for you, and the more you solve, the better your calculus will be.

(2) Regarding now the counterexample, recall first that the simplest example of a function which is continuous, but not differentiable, was [math]f(x)=|x|[/math], the idea behind this being to use a “piecewise linear function whose branches do not fit well”. In connection now with our question, piecewise linear will not do, but we can use a similar idea, namely “piecewise quadratic function whose branches do not fit well”. So, let us set:

[[math]] f(x)=\begin{cases} ax^2& (x\leq0)\\ bx^2& (x\geq 0) \end{cases} [[/math]]

This function is then differentiable, with its derivative being:

[[math]] f'(x)=\begin{cases} 2ax& (x\leq0)\\ 2bx& (x\geq 0) \end{cases} [[/math]]

Now for getting our counterexample, we can set [math]a=-1,b=1[/math], so that [math]f[/math] is:

[[math]] f(x)=\begin{cases} -x^2& (x\leq0)\\ x^2& (x\geq 0) \end{cases} [[/math]]

Indeed, the derivative is [math]f'(x)=2|x|[/math], which is not differentiable, as desired.

■

Getting now to theory, we first have the following key result:

Theorem

Any twice differentiable function [math]f:\mathbb R\to\mathbb R[/math] is locally quadratic,

[[math]] f(x+t)\simeq f(x)+f'(x)t+\frac{f''(x)}{2}\,t^2 [[/math]]

with [math]f''(x)[/math] being as usual the derivative of the function [math]f':\mathbb R\to\mathbb R[/math] at the point [math]x[/math].

Show Proof

Assume indeed that [math]f[/math] is twice differentiable at [math]x[/math], and let us try to construct an approximation of [math]f[/math] around [math]x[/math] by a quadratic function, as follows:

[[math]] f(x+t)\simeq a+bt+ct^2 [[/math]]

We must have [math]a=f(x)[/math], and we also know from Theorem 3.3 that [math]b=f'(x)[/math] is the correct choice for the coefficient of [math]t[/math]. Thus, our approximation must be as follows:

[[math]] f(x+t)\simeq f(x)+f'(x)t+ct^2 [[/math]]

In order to find the correct choice for [math]c\in\mathbb R[/math], observe that the function [math]t\to f(x+t)[/math] matches with [math]t\to f(x)+f'(x)t+ct^2[/math] in what regards the value at [math]t=0[/math], and also in what regards the value of the derivative at [math]t=0[/math]. Thus, the correct choice of [math]c\in\mathbb R[/math] should be the one making match the second derivatives at [math]t=0[/math], and this gives:

[[math]] f''(x)=2c [[/math]]

We are therefore led to the formula in the statement, namely:

[[math]] f(x+t)\simeq f(x)+f'(x)t+\frac{f''(x)}{2}\,t^2 [[/math]]

In order to prove now that this formula holds indeed, we will use L'H\^opital's rule, which states that the [math]0/0[/math] type limits can be computed as follows:

[[math]] \frac{f(x)}{g(x)}\simeq\frac{f'(x)}{g'(x)} [[/math]]

Observe that this formula holds indeed, as an application of Theorem 3.3. Now by using this, if we denote by [math]\varphi(t)\simeq P(t)[/math] the formula to be proved, we have:

[[math]] \begin{eqnarray*} \frac{\varphi(t)-P(t)}{t^2} &\simeq&\frac{\varphi'(t)-P'(t)}{2t}\\ &\simeq&\frac{\varphi''(t)-P''(t)}{2}\\ &=&\frac{f''(x)-f''(x)}{2}\\ &=&0 \end{eqnarray*} [[/math]]

Thus, we are led to the conclusion in the statement.

■

The above result substantially improves Theorem 3.3, and there are many applications of it. As a first such application, justifying Interpretation 3.15 (3), we have the following statement, which is a bit heuristic, but we will call it however Proposition:

Proposition

Intuitively speaking, the second derivative [math]f''(x)\in\mathbb R[/math] computes how much different is [math]f(x)[/math], compared to the average of [math]f(z)[/math], with [math]z\simeq x[/math].

Show Proof

As already mentioned, this is something a bit heuristic, but which is good to know. Let us write the formula in Theorem 3.17, as such, and with [math]t\to-t[/math] too:

[[math]] f(x+t)\simeq f(x)+f'(x)t+\frac{f''(x)}{2}\,t^2 [[/math]]

[[math]] f(x-t)\simeq f(x)-f'(x)t+\frac{f''(x)}{2}\,t^2 [[/math]]

By making the average, we obtain the following formula:

[[math]] \frac{f(x+t)+f(x-t)}{2}=f(x)+\frac{f''(x)}{2}\,t^2 [[/math]]

Now assume that we have found a way of averaging things over [math]t\in[-\varepsilon,\varepsilon][/math], with the corresponding averages being denoted [math]I[/math]. We obtain from the above:

[[math]] I(f)=f(x)+f''(x)I\left(\frac{t^2}{2}\right) [[/math]]

But this is what our statement says, save for some uncertainties regarding the averaging method, and the precise value of [math]I(t^2/2)[/math]. We will leave this for later.

■

Back to rigorous mathematics now, and of course with apologies for the physics intermezzo, but Proposition 3.18 is really cool isn't it, and we will be back later to this with full mathematical details, after developing more theory, that is promised, as a second application of Theorem 3.17, we can improve as well Theorem 3.11, as follows:

Theorem

The local minima and local maxima of a twice differentiable function [math]f:\mathbb R\to\mathbb R[/math] appear at the points [math]x\in\mathbb R[/math] where

[[math]] f'(x)=0 [[/math]]

with the local minima corresponding to the case [math]f'(x)\geq0[/math], and with the local maxima corresponding to the case [math]f''(x)\leq0[/math].

Show Proof

The first assertion is something that we already know. As for the second assertion, we can use the formula in Theorem 3.17, which in the case [math]f'(x)=0[/math] reads:

[[math]] f(x+t)\simeq f(x)+\frac{f''(x)}{2}\,t^2 [[/math]]

Indeed, assuming [math]f''(x)\neq 0[/math], it is clear that the condition [math]f''(x) \gt 0[/math] will produce a local minimum, and that the condition [math]f''(x) \lt 0[/math] will produce a local maximum.

■

As before with Theorem 3.11, the above result is not the end of the story with the mathematics of the local minima and maxima, because things are undetermined when:

[[math]] f'(x)=f''(x)=0 [[/math]]

For instance the functions [math]\pm x^n[/math] with [math]n\in\mathbb N[/math] all satisfy this condition at [math]x=0[/math], which is a minimum for the functions of type [math]x^{2m}[/math], a maximum for the functions of type [math]-x^{2m}[/math], and not a local minimum or local maximum for the functions of type [math]\pm x^{2m+1}[/math].

There are some comments to be made in relation with Algorithm 3.13 as well. Normally that algorithm stays strong, because Theorem 3.19 can only help in relation with the final steps, and is it worth it to compute the second derivative [math]f''[/math], just for getting rid of roughly [math]1/2[/math] of the [math]f(x)[/math] values to be compared. However, in certain cases, this method proves to be useful, so Theorem 3.19 is good to know, when applying that algorithm.

As a main concrete application now of the second derivative, which is something very useful in practice, and related to Interpretation 3.15 (4), we have the following result:

Theorem

Given a convex function [math]f:\mathbb R\to\mathbb R[/math], we have the following Jensen inequality, for any [math]x_1,\ldots,x_N\in\mathbb R[/math], and any [math]\lambda_1,\ldots,\lambda_N \gt 0[/math] summing up to [math]1[/math],

[[math]] f(\lambda_1x_1+\ldots+\lambda_Nx_N)\leq\lambda_1f(x_1)+\ldots+\lambda_Nx_N [[/math]]

with equality when [math]x_1=\ldots=x_N[/math]. In particular, by taking the weights [math]\lambda_i[/math] to be all equal, we obtain the following Jensen inequality, valid for any [math]x_1,\ldots,x_N\in\mathbb R[/math],

[[math]] f\left(\frac{x_1+\ldots+x_N}{N}\right)\leq\frac{f(x_1)+\ldots+f(x_N)}{N} [[/math]]

and once again with equality when [math]x_1=\ldots=x_N[/math]. A similar statement holds for the concave functions, with all the inequalities being reversed.

Show Proof

This is indeed something quite routine, the idea being as follows:

(1) First, we can talk about convex functions in a usual, intuitive way, with this meaning by definition that the following inequality must be satisfied:

[[math]] f\left(\frac{x+y}{2}\right)\leq\frac{f(x)+f(y)}{2} [[/math]]

(2) But this means, via a simple argument, by approximating numbers [math]t\in[0,1][/math] by sums of powers [math]2^{-k}[/math], that for any [math]t\in[0,1][/math] we must have:

[[math]] f(tx+(1-t)y)\leq tf(x)+(1-t)f(y) [[/math]]

Alternatively, via yet another simple argument, this time by doing some geometry with triangles, this means that we must have:

[[math]] f\left(\frac{x_1+\ldots+x_N}{N}\right)\leq\frac{f(x_1)+\ldots+f(x_N)}{N} [[/math]]

But then, again alternatively, by combining the above two simple arguments, the following must happen, for any [math]\lambda_1,\ldots,\lambda_N \gt 0[/math] summing up to [math]1[/math]:

[[math]] f(\lambda_1x_1+\ldots+\lambda_Nx_N)\leq\lambda_1f(x_1)+\ldots+\lambda_Nx_N [[/math]]

(3) Summarizing, all our Jensen inequalities, at [math]N=2[/math] and at [math]N\in\mathbb N[/math] arbitrary, are equivalent. The point now is that, if we look at what the first Jensen inequality, that we took as definition for the convexity, exactly means, this is simply equivalent to:

[[math]] f''(x)\geq0 [[/math]]

(4) Thus, we are led to the conclusions in the statement, regarding the convex functions. As for the concave functions, the proof here is similar. Alternatively, we can say that [math]f[/math] is concave precisely when [math]-f[/math] is convex, and get the results from what we have.

■

As a basic application of the Jensen inequality, which is very classical, we have:

Theorem

For any [math]p\in(1,\infty)[/math] we have the following inequality,

[[math]] \left|\frac{x_1+\ldots+x_N}{N}\right|^p\leq\frac{|x_1|^p+\ldots+|x_N|^p}{N} [[/math]]

and for any [math]p\in(0,1)[/math] we have the following inequality,

[[math]] \left|\frac{x_1+\ldots+x_N}{N}\right|^p\geq\frac{|x_1|^p+\ldots+|x_N|^p}{N} [[/math]]

with in both cases equality precisely when [math]|x_1|=\ldots=|x_N|[/math].

Show Proof

This follows indeed from Theorem 3.20, because we have:

[[math]] (x^p)''=p(p-1)x^{p-2} [[/math]]

Thus [math]x^p[/math] is convex for [math]p \gt 1[/math] and concave for [math]p \lt 1[/math], which gives the results.

■

Observe that at [math]p=2[/math] we obtain as particular case of the above inequality the Cauchy-Schwarz inequality, or rather something equivalent to it, namely:

[[math]] \left(\frac{x_1+\ldots+x_N}{N}\right)^2\leq\frac{x_1^2+\ldots+x_N^2}{N} [[/math]]

We will be back to this later on in this book, when talking scalars products and Hilbert spaces, with some more conceptual proofs for such inequalities.

Finally, as yet another important application of the Jensen inequality, we have:

Theorem

We have the Young inequality,

[[math]] ab\leq \frac{a^p}{p}+\frac{b^q}{q} [[/math]]

valid for any [math]a,b\geq0[/math], and any exponents [math]p,q \gt 1[/math] satisfying [math]\frac{1}{p}+\frac{1}{q}=1[/math].

Show Proof

We use the logarithm function, which is concave on [math](0,\infty)[/math], due to:

[[math]] (\log x)''=\left(-\frac{1}{x}\right)'=-\frac{1}{x^2} [[/math]]

Thus we can apply the Jensen inequality, and we obtain in this way:

[[math]] \begin{eqnarray*} \log\left(\frac{a^p}{p}+\frac{b^q}{q}\right) &\geq&\frac{\log(a^p)}{p}+\frac{\log(b^q)}{q}\\ &=&\log(a)+\log(b)\\ &=&\log(ab) \end{eqnarray*} [[/math]]

Now by exponentiating, we obtain the Young inequality.

■

Observe that for the simplest exponents, namely [math]p=q=2[/math], the Young inequality gives something which is trivial, but is very useful and basic, namely:

[[math]] ab\leq\frac{a^2+b^2}{2} [[/math]]

In general, the Young inequality is something non-trivial, and the idea with it is that “when stuck with a problem, and with [math]ab\leq\frac{a^2+b^2}{2}[/math] not working, try Young”. We will be back to this general principle, later in this book, with some illustrations.

3c. The Taylor formula

Back now to the general theory of the derivatives, and their theoretical applications, we can further develop our basic approximation method, at order 3, at order 4, and so on, the ultimate result on the subject, called Taylor formula, being as follows:

Theorem

Any function [math]f:\mathbb R\to\mathbb R[/math] can be locally approximated as

[[math]] f(x+t)=\sum_{k=0}^\infty\frac{f^{(k)}(x)}{k!}\,t^k [[/math]]

where [math]f^{(k)}(x)[/math] are the higher derivatives of [math]f[/math] at the point [math]x[/math].

Show Proof

Consider the function to be approximated, namely:

[[math]] \varphi(t)=f(x+t) [[/math]]

Let us try to best approximate this function at a given order [math]n\in\mathbb N[/math]. We are therefore looking for a certain polynomial in [math]t[/math], of the following type:

[[math]] P(t)=a_0+a_1t+\ldots+a_nt^n [[/math]]

The natural conditions to be imposed are those stating that [math]P[/math] and [math]\varphi[/math] should match at [math]t=0[/math], at the level of the actual value, of the derivative, second derivative, and so on up the [math]n[/math]-th derivative. Thus, we are led to the approximation in the statement:

[[math]] f(x+t)\simeq\sum_{k=0}^n\frac{f^{(k)}(x)}{k!}\,t^k [[/math]]

In order to prove now that this approximation holds indeed, we can use L'H\^opital's rule, applied several times, as in the proof of Theorem 3.17. To be more precise, if we denote by [math]\varphi(t)\simeq P(t)[/math] the approximation to be proved, we have:

[[math]] \begin{eqnarray*} \frac{\varphi(t)-P(t)}{t^n} &\simeq&\frac{\varphi'(t)-P'(t)}{nt^{n-1}}\\ &\simeq&\frac{\varphi''(t)-P''(t)}{n(n-1)t^{n-2}}\\ &\vdots&\\ &\simeq&\frac{\varphi^{(n)}(t)-P^{(n)}(t)}{n!}\\ &=&\frac{f^{(n)}(x)-f^{(n)}(x)}{n!}\\ &=&0 \end{eqnarray*} [[/math]]

Thus, we are led to the conclusion in the statement.

■

Here is a related interesting statement, inspired from the above proof:

Proposition

For a polynomial of degree [math]n[/math], the Taylor approximation

[[math]] f(x+t)\simeq\sum_{k=0}^n\frac{f^{(k)}(x)}{k!}\,t^k [[/math]]

is an equality. The converse of this statement holds too.

Show Proof

By linearity, it is enough to check the equality in question for the monomials [math]f(x)=x^p[/math], with [math]p\leq n[/math]. But here, the formula to be proved is as follows:

[[math]] (x+t)^p\simeq\sum_{k=0}^p\frac{p(p-1)\ldots(p-k+1)}{k!}\,x^{p-k}t^k [[/math]]

We recognize the binomial formula, so our result holds indeed. As for the converse, this is clear, because the Taylor approximation is a polynomial of degree [math]n[/math].

■

There are many other things that can be said about the Taylor formula, at the theoretical level, notably with a study of the remainder, when truncating this formula at a given order [math]n\in\mathbb N[/math]. We will be back to this later, in chapter 4 below.

As an application of the Taylor formula, we can now improve the binomial formula, which was actually our main tool so far, in the following way:

Theorem

We have the following generalized binomial formula, with [math]p\in\mathbb R[/math],

[[math]] (x+t)^p=\sum_{k=0}^\infty\binom{p}{k}x^{p-k}t^k [[/math]]

with the generalized binomial coefficients being given by the formula

[[math]] \binom{p}{k}=\frac{p(p-1)\ldots(p-k+1)}{k!} [[/math]]

valid for any [math]|t| \lt |x|[/math]. With [math]p\in\mathbb N[/math], we recover the usual binomial formula.

Show Proof

It is customary to divide everything by [math]x[/math], which is the same as assuming [math]x=1[/math]. The formula to be proved is then as follows, under the assumption [math]|t| \lt 1[/math]:

[[math]] (1+t)^p=\sum_{k=0}^\infty\binom{p}{k}t^k [[/math]]

Let us discuss now the validity of this formula, depending on [math]p\in\mathbb R[/math]:

(1) Case [math]p\in\mathbb N[/math]. According to our definition of the generalized binomial coefficients, we have [math]\binom{p}{k}=0[/math] for [math]k \gt p[/math], so the series is stationary, and the formula to be proved is:

[[math]] (1+t)^p=\sum_{k=0}^p\binom{p}{k}t^k [[/math]]

But this is the usual binomial formula, which holds for any [math]t\in\mathbb R[/math].

(2) Case [math]p=-1[/math]. Here we can use the following formula, valid for [math]|t| \lt 1[/math]:

[[math]] \frac{1}{1+t}=1-t+t^2-t^3+\ldots [[/math]]

But this is exactly our generalized binomial formula at [math]p=-1[/math], because:

[[math]] \binom{-1}{k} =\frac{(-1)(-2)\ldots(-k)}{k!} =(-1)^k [[/math]]

(3) Case [math]p\in-\mathbb N[/math]. This is a continuation of our study at [math]p=-1[/math], which will finish the study at [math]p\in\mathbb Z[/math]. With [math]p=-m[/math], the generalized binomial coefficients are:

[[math]] \begin{eqnarray*} \binom{-m}{k} &=&\frac{(-m)(-m-1)\ldots(-m-k+1)}{k!}\\ &=&(-1)^k\frac{m(m+1)\ldots(m+k-1)}{k!}\\ &=&(-1)^k\frac{(m+k-1)!}{(m-1)!k!}\\ &=&(-1)^k\binom{m+k-1}{m-1} \end{eqnarray*} [[/math]]

Thus, our generalized binomial formula at [math]p=-m[/math] reads:

[[math]] \frac{1}{(1+t)^m}=\sum_{k=0}^\infty(-1)^k\binom{m+k-1}{m-1}t^k [[/math]]

But this is something which holds indeed, as we know from chapter 2.

(4) General case, [math]p\in\mathbb R[/math]. As we can see, things escalate quickly, so we will skip the next step, [math]p\in\mathbb Q[/math], and discuss directly the case [math]p\in\mathbb R[/math]. Consider the following function:

[[math]] f(x)=x^p [[/math]]

The derivatives at [math]x=1[/math] are then given by the following formula:

[[math]] f^{(k)}(1)=p(p-1)\ldots(p-k+1) [[/math]]

Thus, the Taylor approximation at [math]x=1[/math] is as follows:

[[math]] f(1+t)=\sum_{k=0}^\infty\frac{p(p-1)\ldots(p-k+1)}{k!}\,t^k [[/math]]

But this is exactly our generalized binomial formula, so we are done with the case where [math]t[/math] is small. With a bit more care, we obtain that this holds for any [math]|t| \lt 1[/math], and we will leave this as an instructive exercise, and come back to it, later in this book.

■

We can see from the above the power of the Taylor formula, saving us from quite complicated combinatorics. Remember indeed the mess from chapter 2, when trying to directly establish particular cases of the generalized binomial formula. Gone all that.

As a main application now of our generalized binomial formula, which is something very useful in practice, we can extract square roots, as follows:

Proposition

We have the following formula,

[[math]] \sqrt{1+t}=1-2\sum_{k=1}^\infty C_{k-1}\left(\frac{-t}{4}\right)^k [[/math]]

with [math]C_k=\frac{1}{k+1}\binom{2k}{k}[/math] being the Catalan numbers. Also, we have

[[math]] \frac{1}{\sqrt{1+t}}=\sum_{k=0}^\infty D_k\left(\frac{-t}{4}\right)^k [[/math]]

with [math]D_k=\binom{2k}{k}[/math] being the central binomial coefficients.

Show Proof

This is something that we already know from chapter 2, but time now to review all this. At [math]p=1/2[/math], the generalized binomial coefficients are:

[[math]] \begin{eqnarray*} \binom{1/2}{k} &=&\frac{1/2(-1/2)\ldots(3/2-k)}{k!}\\ &=&(-1)^{k-1}\frac{(2k-2)!}{2^{k-1}(k-1)!2^kk!}\\ &=&-2\left(\frac{-1}{4}\right)^kC_{k-1} \end{eqnarray*} [[/math]]

Also, at [math]p=-1/2[/math], the generalized binomial coefficients are:

[[math]] \begin{eqnarray*} \binom{-1/2}{k} &=&\frac{-1/2(-3/2)\ldots(1/2-k)}{k!}\\ &=&(-1)^k\frac{(2k)!}{2^kk!2^kk!}\\ &=&\left(\frac{-1}{4}\right)^kD_k \end{eqnarray*} [[/math]]

Thus, Theorem 3.25 at [math]p=\pm1/2[/math] gives the formulae in the statement.

■

As another basic application of the Taylor series, we have:

Theorem

We have the following formulae,

[[math]] \sin x=\sum_{l=0}^\infty(-1)^l\frac{x^{2l+1}}{(2l+1)!}\quad,\quad \cos x=\sum_{l=0}^\infty(-1)^l\frac{x^{2l}}{(2l)!} [[/math]]

as well as the following formulae,

[[math]] e^x=\sum_{k=0}^\infty\frac{x^k}{k!}\quad,\quad \log(1+x)=\sum_{k=0}^\infty(-1)^{k+1}\frac{x^k}{k} [[/math]]

as Taylor series, and in general as well, with [math]|x| \lt 1[/math] needed for [math]\log[/math].

Show Proof

There are several statements here, the proofs being as follows:

(1) Regarding [math]\sin[/math] and [math]\cos[/math], we can use here the following formulae:

[[math]] (\sin x)'=\cos x\quad,\quad (\cos x)'=-\sin x [[/math]]

Thus, we can differentiate [math]\sin[/math] and [math]\cos[/math] as many times as we want to, so we can compute the corresponding Taylor series, and we obtain the formulae in the statement.

(2) Regarding [math]\exp[/math] and [math]\log[/math], here the needed formulae, which lead to the formulae in the statement for the corresponding Taylor series, are as follows:

[[math]] (e^x)'=e^x [[/math]]

[[math]] (\log x)'=x^{-1} [[/math]]

[[math]] (x^p)'=px^{p-1} [[/math]]

(3) Finally, the fact that the formulae in the statement extend beyond the small [math]t[/math] setting, coming from Taylor series, is something standard too. We will leave this as an instructive exercise, and come back to it later, in chapter 6 below.

■

3d. Differential equations

Good news, with the calculus that we know we can do some physics, in 1 dimension. Let us start with something immensely important, in the history of science: \begin{fact} Newton invented calculus for formulating the laws of motion as

[[math]] v=\dot{x}\quad,\quad a=\dot{v} [[/math]]

where [math]x,v,a[/math] are the position, speed and acceleration, and the dots are time derivatives. \end{fact} To be more precise, the variable in Newton's physics is time [math]t\in\mathbb R[/math], playing the role of the variable [math]x\in\mathbb R[/math] that we have used in the above. And we are looking at a particle whose position is described by a function [math]x=x(t)[/math]. Then, it is quite clear that the speed of this particle should be described by the first derivative [math]v=x'(t)[/math], and that the acceleration of the particle should be described by the second derivative [math]a=v'(t)=x''(t)[/math].

Summarizing, with Newton's theory of derivatives, as we learned it in this chapter, we can certainly do some mathematics for the motion of bodies. But, for these bodies to move, we need them to be acted upon by some forces, right? The simplest such force is gravity, and in our present, modest 1 dimensional setting, we have:

Theorem

The equation of a gravitational free fall, in [math]1[/math] dimension, is

[[math]] \ddot{x}=-\frac{GM}{x^2} [[/math]]

with [math]M[/math] being the attracting mass, and [math]G\simeq 6.674\times 10^{-11}[/math] being a constant.

Show Proof

Assume indeed that we have a free falling object, in [math]1[/math] dimension:

[[math]] \xymatrix@R=20pt@C=10pt{ \circ_m\ar[d]\\ \bullet_M} [[/math]]

In order to reach to calculus as we know it, we must peform a rotation, as to have all this happening on the [math]Ox[/math] axis. By doing this, and assuming that [math]M[/math] is fixed at [math]0[/math], our picture becomes as follows, with the attached numbers being now the coordinates:

[[math]] \xymatrix@R=20pt@C=20pt{ \bullet_0&\circ_x\ar[l]} [[/math]]

Now comes the physics. The gravitational force exterted by [math]M[/math], which is fixed in our formalism, on the object [math]m[/math] which moves, is subject to the following equations:

[[math]] F=-G\cdot\frac{Mm}{x^2}\quad,\quad F=ma\quad,\quad a=\dot{v}\quad,\quad v=\dot{x} [[/math]]

To be more precise, in the first equation [math]G\simeq 6.674\times 10^{-11}[/math] is the gravitational constant, in usual SI units, and the sign is [math]-[/math] because [math]F[/math] is attractive. The second equation is something standard and very intuitive, and the last two equations are those from Fact 3.28. Now observe that, with the above data for [math]F[/math], the equation [math]F=ma[/math] reads:

[[math]] -G\cdot\frac{Mm}{x^2}=m\ddot{x} [[/math]]

Thus, by simplifying, we are led to the equation in the statement.

■

As more phsyics, we can talk as well about waves in 1 dimension, as follows:

Theorem

The wave equation in [math]1[/math] dimension is

[[math]] \ddot{\varphi}=v^2\varphi'' [[/math]]

with the dot denoting time derivatives, and [math]v \gt 0[/math] being the propagation speed.

Show Proof

In order to understand the propagation of the waves, let us model the space, which is [math]\mathbb R[/math] for us, as a network of balls, with springs between them, as follows:

[[math]] \cdots\times\!\!\!\times\!\!\!\times\bullet\times\!\!\!\times\!\!\!\times\bullet\times\!\!\!\times\!\!\!\times\bullet\times\!\!\!\times\!\!\!\times\bullet\times\!\!\!\times\!\!\!\times\bullet\times\!\!\!\times\!\!\!\times\cdots [[/math]]

Now let us send an impulse, and see how balls will be moving. For this purpose, we zoom on one ball. The situation here is as follows, [math]l[/math] being the spring length:

[[math]] \cdots\cdots\bullet_{\varphi(x-l)}\times\!\!\!\times\!\!\!\times\bullet_{\varphi(x)}\times\!\!\!\times\!\!\!\times\bullet_{\varphi(x+l)}\cdots\cdots [[/math]]

We have two forces acting at [math]x[/math]. First is the Newton motion force, mass times acceleration, which is as follows, with [math]m[/math] being the mass of each ball:

[[math]] F_n=m\cdot\ddot{\varphi}(x) [[/math]]

And second is the Hooke force, displacement of the spring, times spring constant. Since we have two springs at [math]x[/math], this is as follows, [math]k[/math] being the spring constant:

[[math]] \begin{eqnarray*} F_h &=&F_h^r-F_h^l\\ &=&k(\varphi(x+l)-\varphi(x))-k(\varphi(x)-\varphi(x-l))\\ &=&k(\varphi(x+l)-2\varphi(x)+\varphi(x-l)) \end{eqnarray*} [[/math]]

We conclude that the equation of motion, in our model, is as follows:

[[math]] m\cdot\ddot{\varphi}(x)=k(\varphi(x+l)-2\varphi(x)+\varphi(x-l)) [[/math]]

Now let us take the limit of our model, as to reach to continuum. For this purpose we will assume that our system consists of [math]N \gt \gt 0[/math] balls, having a total mass [math]M[/math], and spanning a total distance [math]L[/math]. Thus, our previous infinitesimal parameters are as follows, with [math]K[/math] being the spring constant of the total system, which is of course lower than [math]k[/math]:

[[math]] m=\frac{M}{N}\quad,\quad k=KN\quad,\quad l=\frac{L}{N} [[/math]]

With these changes, our equation of motion found in (1) reads:

[[math]] \ddot{\varphi}(x)=\frac{KN^2}{M}(\varphi(x+l)-2\varphi(x)+\varphi(x-l)) [[/math]]

Now observe that this equation can be written, more conveniently, as follows:

[[math]] \ddot{\varphi}(x)=\frac{KL^2}{M}\cdot\frac{\varphi(x+l)-2\varphi(x)+\varphi(x-l)}{l^2} [[/math]]

With [math]N\to\infty[/math], and therefore [math]l\to0[/math], we obtain in this way:

[[math]] \ddot{\varphi}(x)=\frac{KL^2}{M}\cdot\frac{d^2\varphi}{dx^2}(x) [[/math]]

Thus, we are led to the conclusion in the statement.

■

Along the same lines, we can talk as well about the heat equation in 1D, as follows:

Theorem

The heat equation in [math]1[/math] dimension is

[[math]] \dot{\varphi}=\alpha\varphi'' [[/math]]

where [math]\alpha \gt 0[/math] is the thermal diffusivity of the medium.

Show Proof

As before with the wave equation, this is not exactly a theorem, but rather what comes out of experiments, but we can justify this mathematically, as follows:

(1) As an intuitive explanation for this equation, since the second derivative [math]\varphi''[/math] computes the average value of a function [math]\varphi[/math] around a point, minus the value of [math]\varphi[/math] at that point, as we know from Proposition 3.18, the heat equation as formulated above tells us that the rate of change [math]\dot{\varphi}[/math] of the temperature of the material at any given point must be proportional, with proportionality factor [math]\alpha \gt 0[/math], to the average difference of temperature between that given point and the surrounding material. Which sounds reasonable.

(2) In practice now, we can use, a bit like before for the wave equation, a lattice model as follows, with distance [math]l \gt 0[/math] between the neighbors:

[[math]] \xymatrix@R=10pt@C=20pt{ \ar@{-}[r]&\circ_{x-l}\ar@{-}[r]^l&\circ_x\ar@{-}[r]^l&\circ_{x+l}\ar@{-}[r]& } [[/math]]

In order to model now heat diffusion, we have to implement the intuitive mechanism explained above, and in practice, this leads to a condition as follows, expressing the change of the temperature [math]\varphi[/math], over a small period of time [math]\delta \gt 0[/math]:

[[math]] \varphi(x,t+\delta)=\varphi(x,t)+\frac{\alpha\delta}{l^2}\sum_{x\sim y}\left[\varphi(y,t)-\varphi(x,t)\right] [[/math]]

But this leads, via manipulations as before, to [math]\dot{\varphi}(x,t)=\alpha\cdot\varphi''(x,t)[/math], as claimed.

■

All this is very nice, so with the calculus that we know, we can certainly talk about physics. We will see later in this book how to deal with the above equations.

General references

Banica, Teo (2024). "Calculus and applications". arXiv:2401.00911 [math.CO].