Basic calculus
5a. Real analysis
We discuss in what follows some applications of the theory that we developed above, for the most to questions in analysis. The idea will be that the functions of several variables [math]f:\mathbb R^N\to\mathbb R^M[/math] can be locally approximated by linear maps, in the same way as the functions [math]f:\mathbb R\to\mathbb R[/math] can be locally approximated by using derivatives:
There are many things that can be said here, and we will be quite brief. As a plan for this chapter and the next one, we would like to quickly review the one-variable calculus, then develop the basics of multivariable calculus, and then get introduced to the Gaussian laws, and to probability theory in general. The instructions being as follows:
(1) In case all this is totally new to you, it is better to stop at this point with reading the present book, and quickly read some calculus. There are plenty of good books here, a standard choice being for instance the books of Lax-Terrell [1], [2].
(2) In case you know a bit about all this, stay with us. But have the books of Rudin [3], [4] nearby, for things not explained in what follows. And have a probability book nearby too, such as Feller [5] or Durrett [6], for some extra help with probability.
(3) Finally, in the case you know well analysis, have of course some fun in quickly reading the material below. But, in parallel to this, upgrade of course, say by learning some differential geometry, for instance from the books of do Carmo [7], [8].
Getting started now, let us first discuss the simplest case, [math]f:\mathbb R\to\mathbb R[/math]. Here we have the following result, which is the starting point for everything in analysis:
Any function of one variable [math]f:\mathbb R\to\mathbb R[/math] is locally affine,
Assume indeed that the limit in the statement converges. By multiplying by [math]t[/math], we obtain that we have, once again in the [math]t\to0[/math] limit:
Thus, we are led to the conclusion in the statement.
As a first example, the derivatives of the power functions are as follows:
We have the differentiation formula
We can do this in three steps, as follows:
(1) In the case [math]p\in\mathbb N[/math] we can use the binomial formula, which gives, as desired:
(2) In the general case [math]p\in\mathbb Q[/math], we can write [math]p=m/n[/math], with [math]m\in\mathbb N[/math] and [math]n\in\mathbb Z[/math], and again by using the binomial formula, we obtain, as desired:
(3) In the general case now, where [math]p\in\mathbb R[/math] is real, the same formula holds, namely [math](x^p)'=px^{p-1}[/math], by using what we found above, and a continuity argument.
There are many other computations that can be done, and we will be back to this later. Now back to the general level, let us record here the following key result:
The derivatives are subject to the following rules:
- Leibnitz rule: [math](fg)'=f'g+fg'[/math].
- Chain rule: [math](f\circ g)'=f'(g)g'[/math].
Both formulae follow from the definition of the derivative, as follows:
(1) Regarding products, we have the following computation:
(2) Regarding compositions, we have the following computation:
Thus, we are led to the conclusions in the statement.
There are many applications of the derivative, and we have for instance:
The local minima and maxima of a differentiable function [math]f:\mathbb R\to\mathbb R[/math] appear at the points [math]x\in\mathbb R[/math] where:
The first assertion is clear from the formula in Theorem 5.1, namely:
As for the converse, the simplest counterexample is [math]f(x)=x^3[/math], at [math]x=0[/math].
At a more advanced level now, we have the following result:
Any function of one variable [math]f:\mathbb R\to\mathbb R[/math] is locally quadratic,
This is something quite intuitive, when thinking geometrically. In practice, we can use L'H\^opital's rule, stating that the [math]0/0[/math] type limits can be computed as:
Observe that this formula holds indeed, as an application of Theorem 5.1. Now by using this, if we denote by [math]\varphi(t)\simeq P(t)[/math] the formula to be proved, we have:
Thus, we are led to the conclusion in the statement.
The above result substantially improves Theorem 5.1, and there are many applications of it. We can improve for instance Proposition 5.4, as follows:
The local minima and maxima of a twice differentiable function [math]f:\mathbb R\to\mathbb R[/math] appear at the points [math]x\in\mathbb R[/math] where
The first assertion is something that we already know. As for the second assertion, we can use the formula in Theorem 5.5, which in the case [math]f'(x)=0[/math] reads:
Indeed, assuming [math]f''(x)\neq 0[/math], it is clear that the condition [math]f''(x) \gt 0[/math] will produce a local minimum, and that the condition [math]f''(x) \lt 0[/math] will produce a local maximum.
We can further develop the above method, at order 3, at order 4, and so on, the ultimate result on the subject, called Taylor formula, being as follows:
Any function [math]f:\mathbb R\to\mathbb R[/math] can be locally approximated as
We use the same method as in the proof of Theorem 5.5. If we denote by [math]\varphi(t)\simeq P(t)[/math] the approximation to be proved, we have:
Thus, we are led to the conclusion in the statement.
As a basic application of the Taylor formula, we have:
We have the following formulae,
There are several statements here, the proofs being as follows:
(1) Regarding [math]\sin[/math] and [math]\cos[/math], we can use here the following well-known formulae:
With these formulae in hand we can appproximate both [math]\sin[/math] and [math]\cos[/math], and we get:
Thus, we can differentiate [math]\sin[/math] and [math]\cos[/math] as many times as we want to, and so we can compute the corresponding Taylor series, and we obtain the formulae in the statement.
(2) Regarding [math]\exp[/math] and [math]\log[/math], here the needed formulae, which lead to the formulae in the statement for the corresponding Taylor series, are as follows:
(3) Finally, the fact that the formulae in the statement extend beyond the small [math]x[/math] setting, coming from Taylor series, is something standard too.
As another basic application of the Taylor formula, we have:
We have the following generalized binomial formula, with [math]p\in\mathbb R[/math],
Consider indeed the following function:
The derivatives at [math]x=1[/math] are then given by the following formula:
Thus, the Taylor approximation at [math]x=1[/math] is as follows:
But this is exactly our generalized binomial formula, so we are done with the case where [math]t[/math] is small. With a bit more care, we obtain that this holds for any [math]|t| \lt 1[/math].
We can see from the above the power of the Taylor formula. As an application now of our generalized binomial formula, we can extract square roots, as follows:
We have the following formula,
The above formulae both follow from Theorem 5.9, as follows:
(1) At [math]p=1/2[/math], the generalized binomial coefficients are:
(2) At [math]p=-1/2[/math], the generalized binomial coefficients are:
Thus, we obtain the formulae in the statement.
Let us discuss as well the basics of integration theory. We will be very brief here, by insisting on the main concepts, and skipping technicalities. We first have:
We have the Riemann integration formula,
Assume indeed that we are given a continuous function [math]f:[a,b]\to\mathbb R[/math], and let us try to compute the signed area below its graph, called integral and denoted [math]\int_a^bf(x)dx[/math]. Obviously, this signed area equals [math]b-a[/math] times the average of the function on [math][a,b][/math], and we are led to the following formula, with [math]x_1,\ldots,x_N\in[a,b][/math] being randomly chosen:
This is the so-called Monte Carlo integration formula, which is extremely useful in practice, and is used by scientists, engineers and computers. However, for theoretical purposes, it is better assume that [math]x_1,\ldots,x_N\in[a,b][/math] are uniformly distributed. With this choice, which works of course too, the formula that we obtain is as follows:
Observe that this latter formula can be alternatively written as follows, which makes it clear that the formula holds indeed, as an approximation of an area by rectangles:
In any case, we have obtained the formula in the statement, and we are done.
The above was of course extremely brief, and for more on all this, including further functions that can be integrated, we refer to Rudin [3], [4] or Lax-Terrell [1], [2]. As a useful piece of advice, however, always keep in mind the Monte Carlo formula, briefly evoked above, because that is the real thing, in connection with anything integration.
The derivatives and integrals are related in several subtle ways, and we have:
We have the following formulae, called fundamental theorem of calculus, integration by parts formula, and change of variable formula,
Again, this is standard, the idea being that the first formula is clear from the area interpretation of the integral, and that the second and third formulae follow from it, by integrating respectively the Leibnitz rule and the chain rule from Theorem 5.3.
So long for one-variable calculus. For more on all this, we refer to any basic analysis book, good choices here being the books of Lax-Terrell [1], [2], or Rudin [3], [4].
5b. Several variables
Let us discuss now what happens in several variables. At order 1, the situation is quite similar to the one in 1 variable, but this time involving matrices, as follows:
Any function [math]f:\mathbb R^N\to\mathbb R^M[/math] can be locally approximated as
acting on the vectors [math]t\in\mathbb R^N[/math] by usual multiplication.
As a first observation, the formula in the statement makes sense indeed, as an equality, or rather approximation, of vectors in [math]\mathbb R^M[/math], as follows:
In order to prove now this formula, which does make sense, the idea is as follows:
(1) First of all, at [math]N=M=1[/math] what we have is a usual 1-variable function [math]f:\mathbb R\to\mathbb R[/math], and the formula in the statement is something that we know well, namely:
(2) Let us discuss now the case [math]N=2,M=1[/math]. Here what we have is a function [math]f:\mathbb R^2\to\mathbb R[/math], and by using twice the basic approximation result from (1), we obtain:
(3) More generally, we can deal in this way with the general case [math]M=1[/math], with the formula here, obtained via a straightforward recurrence, being as follows:
(4) But this gives the result in the case where both [math]N,M\in\mathbb N[/math] are arbitrary too. Indeed, consider a function [math]f:\mathbb R^N\to\mathbb R^M[/math], and let us write it as follows:
We can apply (3) to each of the components [math]f_i:\mathbb R^N\to\mathbb R[/math], and we get:
But this collection of [math]M[/math] formulae tells us precisely that the following happens, as an equality, or rather approximation, of vectors in [math]\mathbb R^M[/math]:
Thus, we are led to the conclusion in the statement.
Generally speaking, Theorem 5.13 is what we need to know for upgrading from calculus to multivariable calculus. As a standard result here, we have:
We have the chain derivative formula
Consider indeed a composition of functions, as follows:
According to Theorem 5.13, the derivatives of these functions are certain linear maps, corresponding to certain rectangular matrices, as follows:
Thus, our formula makes sense indeed. As for proof, this comes from:
Thus, we are led to the conclusion in the statement.
Regarding now the higher derivatives, the situation here is more complicated. Let us record, however, the following fundamental result, happening at order 2, and which does the job, the job in analysis being usually that of finding the minima or maxima:
Given a function [math]f:\mathbb R^N\to\mathbb R[/math], construct its Hessian, as being:
We have then the following order [math]2[/math] approximation of [math]f[/math] around a given [math]x\in\mathbb R^N[/math],
This is something very standard, the idea being as follows:
(1) At [math]N=1[/math] the Hessian matrix is the [math]1\times1[/math] matrix having as entry the usual [math]f''(x)[/math], and the formula in the statement is something that we know well, namely:
(2) In general, our claim is that the formula in the statement follows from the one-variable formula above, applied to the restriction of [math]f[/math] to the following segment in [math]\mathbb R^N[/math]:
To be more precise, let [math]y\in\mathbb R^N[/math], and consider the following function, with [math]r\in\mathbb R[/math]:
We know from (1) that the Taylor formula for [math]g[/math], at the point [math]r=0[/math], reads:
And our claim is that, with [math]t=ry[/math], this is precisely the formula in the statement.
(3) So, let us see if our claim is correct. By using the chain rule, we have the following formula, with on the right, as usual, a row vector multiplied by a column vector:
By using again the chain rule, we can compute the second derivative as well:
(4) Time now to conclude. We know that we have [math]g(r)=f(x+ry)[/math], and according to our various computations above, we have the following formulae:
Buit with this data in hand, the usual Taylor formula for our one variable function [math]g[/math], at order 2, at the point [math]r=0[/math], takes the following form, with [math]t=ry[/math]:
Thus, we have obtained the formula in the statement.
Getting now to integration, as a key result here, we have:
Given a transformation [math]\varphi=(\varphi_1,\ldots,\varphi_N)[/math], we have
and with this generalizing the [math]1[/math]-variable formula that we know well.
This is something quite tricky, the idea being as follows:
(1) Observe first that this generalizes indeed the change of variable formula in 1 dimension, from Theorem 5.12, the point here being that the absolute value on the derivative appears as to compensate for the lack of explicit bounds for the integral.
(2) In general now, we can first argue that, the formula in the statement being linear in [math]f[/math], we can assume [math]f=1[/math]. Thus we want to prove [math]vol(E)=\int_{\varphi^{-1}(E)}|J_\varphi(t)|dt[/math], and with [math]D={\varphi^{-1}(E)}[/math], this amounts in proving [math]vol(\varphi(D))=\int_D|J_\varphi(t)|dt[/math].
(3) Now since this latter formula is additive with respect to [math]D[/math], it is enough to prove that [math]vol(\varphi(D))=\int_D J_\varphi(t)dt[/math], for small cubes [math]D[/math], and assuming [math]J_\varphi \gt 0[/math]. But this follows by using the definition of the determinant as a volume, as in chapter 2.
(4) The details and computations however are quite non-trivial, and can be found for instance in Rudin [3]. So, please read Rudin. With this, reading the complete proof of the present theorem from Rudin, being part of the standard math experience.
5c. Volumes of spheres
We can discuss now some more advanced questions, related to the computation of volumes of the spheres, and to the integration over spheres. Before anything, do you know what [math]\pi[/math] is? I bet not, or at least my students usually don't. So, let me teach you:
Assuming that the length of the unit circle is
This follows by drawing polygons, and taking the [math]N\to\infty[/math] limit. To be more precise, let us cut the disk as a pizza, into [math]N[/math] slices, and leave aside the rounded parts:
The area to be eaten can be then computed as follows, where [math]H[/math] is the height of the slices, [math]S[/math] is the length of their sides, and [math]P=NS[/math] is the total length of the sides:
Thus, we are led to the conclusion in the statement.
In [math]N[/math] dimensions now, things are more complicated, and we will need spherical coordinates, in order to deal with such questions. Let us start with:
We have polar coordinates in [math]2[/math] dimensions,
This is something elementary, the Jacobian being given by:
Thus, we have indeed the formula in the statement.
In 3 dimensions now the formula is similar, as follows:
We have spherical coordinates in [math]3[/math] dimensions,
The fact that we have indeed spherical coordinates is clear. Regarding now the Jacobian, this is given by the following formula:
Thus, we have indeed the formula in the statement.
Let us work out now the spherical coordinate formula in [math]N[/math] dimensions. The result here, which generalizes those at [math]N=2,3[/math], is as follows:
We have spherical coordinates in [math]N[/math] dimensions,
As before, the fact that we have spherical coordinates is clear. Regarding now the Jacobian, also as before, by developing over the last column, we have:
Thus, we obtain the formula in the statement, by recurrence.
As a comment here, the above convention for spherical coordinates, which is particularly beautiful, is one among many. Physicists for instance like to write things a bit upside down, and the same is actually true for physicists' notation for scalar products, which is [math] \lt x,y \gt =\sum_i\bar{x}_iy_i[/math], again upside down, and for many other things. If this bothers you, I can only recommend my physics book [9], written using mathematicians' notations.
By the way, talking physics, again and I insist, you should learn some, normally from Feynman [10], [11], [12], or Griffiths [13], [14], [15], or Weinberg [16], [17], [18]. The point indeed is that using spherical coordinates, while being something usually labelled as “unconceptual”, and avoided by mathematicians, is the ABC of physicists, who use it all the time. Want to do some basic electrodynamics computations? Spherical coordinates. Want to solve the hydrogen atom? Spherical coordinates, again. And so on. So, nothing better than learning some physics, in order to get to know, and love, these spherical coordinates. That we will do love in this book, from the bottom of our hearts.
Back to work now, let us compute the volumes of spheres. For this purpose, we must understand how the products of coordinates integrate over spheres. Let us start with the case [math]N=2[/math]. Here the sphere is the unit circle [math]\mathbb T[/math], and with [math]z=e^{it}[/math] the coordinates are [math]\cos t,\sin t[/math]. We can first integrate arbitrary powers of these coordinates, as follows:
We have the following formulae,
Let us first compute the integral on the left [math]I_p[/math]. We have:
By integrating between [math]0[/math] and [math]\pi/2[/math], we obtain the following formula:
Thus we can compute [math]I_p[/math] by recurrence, and we obtain:
On the other hand, at [math]p=0[/math] we have the following formula:
Also, at [math]p=1[/math] we have the following formula:
Thus, we obtain the result, by recurrence. As for the second formula, regarding [math]\sin t[/math], this follows from the first formula, with the change of variables [math]t=\frac{\pi}{2}-s[/math].
We can now compute the volume of the sphere, as follows:
The volume of the unit sphere in [math]\mathbb R^N[/math] is given by
If we denote by [math]B^+[/math] the positive part of the unit sphere, we have:
Thus, we are led to the formula in the statement.
As main particular cases of the above formula, we have:
The volumes of the low-dimensional spheres are as follows:
- At [math]N=1[/math], the length of the unit interval is [math]V=2[/math].
- At [math]N=2[/math], the area of the unit disk is [math]V=\pi[/math].
- At [math]N=3[/math], the volume of the unit sphere is [math]V=\frac{4\pi}{3}[/math]
- At [math]N=4[/math], the volume of the corresponding unit sphere is [math]V=\frac{\pi^2}{2}[/math].
These are all particular cases of the formula in Theorem 5.22.
5d. Basic estimates
In order to obtain estimates for the volumes, in the large [math]N[/math] limit, we can use:
We have the Stirling formula
This is something quite tricky, the idea being as follows:
(1) We have the following basic approximation, by using a Riemann sum:
(2) By exponentiating we get [math]N!\approx(N/e)^Ne[/math], which is not bad, but not enough. So, we have to fine-tune our method. By using trapezoids instead of rectangles, we get:
Thus, we have [math]\log(N!)\approx N\log N-N+\frac{\log N}{2}+1[/math], which by exponentiating gives:
(3) This is better than before, but still not enough. So, we have to further fine-tune our method, and by using this time some heavier analysis methods, the idea is that we can estimate the error, with [math]\approx[/math] becoming [math]\simeq[/math], and with the [math]e[/math] factor becoming [math]\sqrt{2\pi}[/math].
With the above formula in hand, we have many useful applications, such as:
We have the following estimate for binomial coefficients,
All this is very standard, by using the Stirling formula etablished above, for the various factorials which appear, the idea being as follows:
(1) This follows from the definition of the binomial coefficients, namely:
Thus, we are led to the conclusion in the statement.
(2) This estimate follows from a similar computation, as follows:
Alternatively, we can take [math]t=1/2[/math] in (1), then rescale. Indeed, we have:
Thus with the change [math]N\to 2N[/math] we obtain the formula in the statement.
Summarizing, we have so far complete estimate for the factorials. Regarding now the double factorials, that we will need as well, the result here is as follows:
We have the following estimate for the double factorials,
Once again this is standard, the idea being as follows:
(1) When [math]N=2K[/math] is even, we have the following computation:
(2) When [math]N=2K+1[/math] is odd, we have the following computation:
(3) Back to the case where [math]N=2K[/math] is even, by using (2) we obtain:
(4) Finally, back to the case where [math]N=2K+1[/math] is odd, by using (1) we obtain:
Thus, we have proved the estimates in the statement.
We can now estimate the volumes of the spheres, as follows:
The volume of the unit sphere in [math]\mathbb R^N[/math] is given by
We use Theorem 5.22. When [math]N[/math] is even, the estimate goes as follows:
In the case where [math]N[/math] is odd, the estimate goes as follows:
Thus, we are led to the uniform formula in the statement.
Getting back now to our main result so far, Theorem 5.22, we can compute in the same way the area of the sphere, the result being as follows:
The area of the unit sphere in [math]\mathbb R^N[/math] is given by
Regarding the first assertion, we can use the slicing argument from the proof of Theorem 5.17, which shows that the area and volume of the sphere in [math]\mathbb R^N[/math] are related by the following formula, which together with Theorem 5.22 gives the result:
As for the last assertion, this can be either worked out directly, or deduced from the results for volumes that we have so far, by multiplying by [math]N[/math].
General references
Banica, Teo (2024). "Linear algebra and group theory". arXiv:2206.09283 [math.CO].
References
- 1.0 1.1 1.2 P. Lax and M.S. Terrell, Calculus with applications, Springer (2013).
- 2.0 2.1 2.2 P. Lax and M.S. Terrell, Multivariable calculus with applications, Springer (2018).
- 3.0 3.1 3.2 3.3 W. Rudin, Principles of mathematical analysis, McGraw-Hill (1964).
- 4.0 4.1 4.2 W. Rudin, Real and complex analysis, McGraw-Hill (1966).
- W. Feller, An introduction to probability theory and its applications, Wiley (1950).
- R. Durrett, Probability: theory and examples, Cambridge Univ. Press (1990).
- M.P. do Carmo, Differential geometry of curves and surfaces, Dover (1976).
- M.P. do Carmo, Riemannian geometry, Birkh\"auser (1992).
- T. Banica, Introduction to modern physics (2024).
- R.P. Feynman, R.B. Leighton and M. Sands, The Feynman lectures on physics I: mainly mechanics, radiation and heat, Caltech (1963).
- R.P. Feynman, R.B. Leighton and M. Sands, The Feynman lectures on physics II: mainly electromagnetism and matter, Caltech (1964).
- R.P. Feynman, R.B. Leighton and M. Sands, The Feynman lectures on physics III: quantum mechanics, Caltech (1966).
- D.J. Griffiths, Introduction to electrodynamics, Cambridge Univ. Press (2017).
- D.J. Griffiths and D.F. Schroeter, Introduction to quantum mechanics, Cambridge Univ. Press (2018).
- D.J. Griffiths, Introduction to elementary particles, Wiley (2020).
- S. Weinberg, Foundations of modern physics, Cambridge Univ. Press (2011).
- S. Weinberg, Lectures on quantum mechanics, Cambridge Univ. Press (2012).
- S. Weinberg, Lectures on astrophysics, Cambridge Univ. Press (2019).