Integration

[math] \newcommand{\mathds}{\mathbb}[/math]

This article was automatically generated from a tex file and may contain conversion errors. If permitted, you may login and edit this article to improve the conversion.

4a. Integration theory

We have seen so far the foundations of calculus, with lots of interesting results regarding the functions [math]f:\mathbb R\to\mathbb R[/math], and their derivatives [math]f':\mathbb R\to\mathbb R[/math]. The general idea was that in order to understand [math]f[/math], we first need to compute its derivative [math]f'[/math]. The overall conclusion, coming from the Taylor formula, was that if we are able to compute [math]f'[/math], but then also [math]f''[/math], and [math]f'''[/math] and so on, we will have a good understanding of [math]f[/math] itself.


However, the story is not over here, and there is one more twist to the plot. Which will be a major twist, of similar magnitude to that of the Taylor formula. For reasons which are quite tricky, that will become clear later on, we will be interested in the integration of the functions [math]f:\mathbb R\to\mathbb R[/math]. With the claim that this is related to calculus.


There are several possible viewpoints on the integral, which are all useful, and good to know. To start with, we have something very simple, as follows:

Definition

The integral of a continuous function [math]f:[a,b]\to\mathbb R[/math], denoted

[[math]] \int_a^bf(x)dx [[/math]]
is the area below the graph of [math]f[/math], signed [math]+[/math] where [math]f\geq0[/math], and signed [math]-[/math] where [math]f\leq0[/math].

Here it is of course understood that the area in question can be computed, and with this being something quite subtle, that we will get into later. For the moment, let us just trust our intuition, our function [math]f[/math] being continuous, the area in question can “obviously” be computed. More on this later, but for being rigorous, however, let us formulate:

\begin{method} In practice, the integral of [math]f\geq 0[/math] can be computed as follows,

  • Cut the graph of [math]f[/math] from 3mm plywood,
  • Plunge that graph into a square container of water,
  • Measure the water displacement, as to have the volume of the graph,
  • Divide by [math]3\times 10^{-3}[/math] that volume, as to have the area,

and for general [math]f[/math], we can use this plus [math]f=f_+-f_-[/math], with [math]f_+,f_-\geq0[/math]. \end{method} So far, so good, we have a rigorous definition, so let us do now some computations. In order to compute areas, and so integrals of functions, without wasting precious water, we can use our geometric knowledge. Here are some basic results of this type:

Proposition

We have the following results:

  • When [math]f[/math] is linear, we have the following formula:
    [[math]] \int_a^bf(x)dx=(b-a)\cdot\frac{f(a)+f(b)}{2} [[/math]]
  • In fact, when [math]f[/math] is piecewise linear on [math][a=a_1,a_2,\ldots,a_n=b][/math], we have:
    [[math]] \int_a^bf(x)dx=\sum_{i=1}^{n-1}(a_{i+1}-a_i)\cdot\frac{f(a_i)+f(a_{i+1})}{2} [[/math]]
  • We have as well the formula [math]\int_{-1}^1\sqrt{1-x^2}\,dx=\pi/2[/math].


Show Proof

These results all follow from basic geometry, as follows:


(1) Assuming [math]f\geq0[/math], we must compute the area of a trapezoid having sides [math]f(a),f(b)[/math], and height [math]b-a[/math]. But this is the same as the area of a rectangle having side [math](f(a)+f(b))/2[/math] and height [math]b-a[/math], and we obtain [math](b-a)(f(a)+f(b))/2[/math], as claimed.


(2) This is clear indeed from the formula found in (1), by additivity.


(3) The integral in the statement is by definition the area of the upper unit half-disc. But since the area of the whole unit disc is [math]\pi[/math], this half-disc area is [math]\pi/2[/math].

As an interesting observation, (2) in the above result makes it quite clear that [math]f[/math] does not necessarily need to be continuous, in order to talk about its integral. Indeed, assuming that [math]f[/math] is piecewise linear on [math][a=a_1,a_2,\ldots,a_n=b][/math], but not necessarily continuous, we can still talk about its integral, in the obvious way, exactly as in Definition 4.1, and we have an explicit formula for this integral, generalizing the one found in (2), namely:

[[math]] \int_a^bf(x)dx=\sum_{i=1}^{n-1}(a_{i+1}-a_i)\cdot\frac{f(a_i^+)+f(a_{i+1}^-)}{2} [[/math]]


Based on this observation, let us upgrade our formalism, as follows:

Definition

We say that a function [math]f:[a,b]\to\mathbb R[/math] is integrable when the area below its graph is computable. In this case we denote by

[[math]] \int_a^bf(x)dx [[/math]]
this area, signed [math]+[/math] where [math]f\geq0[/math], and signed [math]-[/math] where [math]f\leq0[/math].

As basic examples of integrable functions, we have the continuous ones, provided indeed that our intuition, or that Method 4.2, works indeed for any such function. We will soon see that this is indeed true, coming with mathematical proof. As further examples, we have the functions which are piecewise linear, or more generally piecewise continuous. We will also see, later, as another class of examples, that the piecewise monotone functions are integrable. But more on this later, let us not bother for the moment with all this.


This being said, one more thing regarding theory, that you surely have in mind: is any function integrable? Not clear. I would say that if the Devil comes with some sort of nasty, totally discontinuous function [math]f:\mathbb R\to\mathbb R[/math], then you will have big troubles in cutting its graph from 3mm plywood, as required by Method 4.2. More on this later.


Back to work now, here are some general results regarding the integrals:

Proposition

We have the following formulae,

[[math]] \int_a^bf(x)+g(x)dx=\int_a^bf(x)dx+\int_a^bg(x)dx [[/math]]

[[math]] \int_a^b\lambda f(x)=\lambda\int_a^bf(x) [[/math]]
valid for any functions [math]f,g[/math] and any scalar [math]\lambda\in\mathbb R[/math].


Show Proof

Both these formulae are indeed clear from definitions.

Moving ahead now, passed the above results, which are of purely algebraic and geometric nature, and perhaps a few more of the same type, which are all quite trivial and that we we will not get into here, we must do some analysis, in order to compute integrals. This is something quite tricky, and we have here the following result:

Theorem

We have the Riemann integration formula,

[[math]] \int_a^bf(x)dx=(b-a)\times\lim_{N\to\infty}\frac{1}{N}\sum_{k=1}^Nf\left(a+\frac{b-a}{N}\cdot k\right) [[/math]]
which can serve as a definition for the integral.


Show Proof

This is standard, by drawing rectangles. We have indeed the following formula, which can stand as a definition for the signed area below the graph of [math]f[/math]:

[[math]] \int_a^bf(x)dx=\lim_{N\to\infty}\frac{1}{N}\sum_{k=1}^N\frac{b-a}{N}\cdot f\left(a+\frac{b-a}{N}\cdot k\right) [[/math]]


Thus, we are led to the formula in the statement.

Observe that the above formula suggests that [math]\int_a^bf(x)dx[/math] is the length of the interval [math][a,b][/math], namely [math]b-a[/math], times the average of [math]f[/math] on the interval [math][a,b][/math]. Thinking a bit, this is indeed something true, with no need for Riemann sums, coming directly from Definition 4.1, because area means side times average height. Thus, we can formulate:

Theorem

The integral of a function [math]f:[a,b]\to\mathbb R[/math] is given by

[[math]] \int_a^bf(x)dx=(b-a)\times A(f) [[/math]]
where [math]A(f)[/math] is the average of [math]f[/math] over the interval [math][a,b][/math].


Show Proof

As explained above, this is clear from Definition 4.1, via some geometric thinking. Alternatively, this is something which certainly comes from Theorem 4.6.

The point of view in Theorem 4.7 is something quite useful, and as an illustration for this, let us review the results that we already have, by using this interpretation. First, we have the formula for linear functions from Proposition 4.3, namely:

[[math]] \int_a^bf(x)dx=(b-a)\cdot\frac{f(a)+f(b)}{2} [[/math]]


But this formula is totally obvious with our new viewpoint, from Theorem 4.7. The same goes for the results in Proposition 4.5, which become even more obvious with the viewpoint from Theorem 4.7. However, not everything trivializes in this way, and the result which is left, namely the formula [math]\int_{-1}^1\sqrt{1-x^2}\,dx=\pi/2[/math] from Proposition 4.3 (3), not only does not trivialize, but becomes quite opaque with our new philosophy.


In short, modesty. Integration is a quite delicate business, and we have several equivalent points of view on what an integral means, and all these points of view are useful, and must be learned, with none of them being clearly better than the others.


Going ahead with more interpretations of the integral, we have:

Theorem

We have the Monte Carlo integration formula,

[[math]] \int_a^bf(x)dx=(b-a)\times\lim_{N\to\infty}\frac{1}{N}\sum_{k=1}^Nf(x_i) [[/math]]
with [math]x_1,\ldots,x_N\in[a,b][/math] being random.


Show Proof

We recall from Theorem 4.7 that the idea is that we have a formula as follows, with the points [math]x_1,\ldots,x_N\in[a,b][/math] being uniformly distributed:

[[math]] \int_a^bf(x)dx=(b-a)\times\lim_{N\to\infty}\frac{1}{N}\sum_{k=1}^Nf(x_i) [[/math]]


But this works as well when the points [math]x_1,\ldots,x_N\in[a,b][/math] are randomly distributed, for somewhat obvious reasons, and this gives the result.

Observe that Monte Carlo integration works better than Riemann integration, for instance when trying to improve the estimate, via [math]N\to N+1[/math]. Indeed, in the context of Riemann integration, assume that we managed to find an estimate as follows, which in practice requires computing [math]N[/math] values of our function [math]f[/math], and making their average:

[[math]] \int_a^bf(x)dx\simeq\frac{b-a}{N}\sum_{k=1}^Nf\left(a+\frac{b-a}{N}\cdot k\right) [[/math]]


In order to improve this estimate, any extra computed value of our function [math]f(y)[/math] will be unuseful. For improving our formula, what we need are [math]N[/math] extra values of our function, [math]f(y_1),\ldots,f(y_N)[/math], with the points [math]y_1,\ldots,y_N[/math] being the midpoints of the previous division of [math][a,b][/math], so that we can write an improvement of our formula, as follows:

[[math]] \int_a^bf(x)dx\simeq\frac{b-a}{2N}\sum_{k=1}^{2N}f\left(a+\frac{b-a}{2N}\cdot k\right) [[/math]]


With Monte Carlo, things are far more flexible. Assume indeed that we managed to find an estimate as follows, which again requires computing [math]N[/math] values of our function:

[[math]] \int_a^bf(x)dx\simeq\frac{b-a}{N}\sum_{k=1}^Nf(x_i) [[/math]]


Now if we want to improve this, any extra computed value of our function [math]f(y)[/math] will be helpful, because we can set [math]x_{n+1}=y[/math], and improve our estimate as follows:

[[math]] \int_a^bf(x)dx\simeq\frac{b-a}{N+1}\sum_{k=1}^{N+1}f(x_i) [[/math]]


And isn't this potentially useful, and powerful, when thinking at practically computing integrals, either by hand, or by using a computer. Let us record this finding as follows: \begin{conclusion} Monte Carlo integration works better than Riemann integration, when it comes to computing as usual, by estimating, and refining the estimate. \end{conclusion} As another interesting feature of Monte Carlo integration, this works better than Riemann integration, for functions having various symmetries, because Riemann integration can get “fooled” by these symmetries, while Monte Carlo remains strong.


As an example for this phenomeon, chosen to be quite drastic, let us attempt to integrate, via both Riemann and Monte Carlo, the following function [math]f:[0,\pi]\to\mathbb R[/math]:

[[math]] f(x)=\Big|\sin(120x)\Big| [[/math]]


The first few Riemann sums for this function are then as follows:

[[math]] I_2(f)=\frac{\pi}{2}(|\sin 0|+|\sin 60\pi|)=0 [[/math]]

[[math]] I_3(f)=\frac{\pi}{3}(|\sin 0|+|\sin 40\pi|+|\sin 80\pi|)=0 [[/math]]

[[math]] I_4(f)=\frac{\pi}{4}(|\sin 0|+|\sin 30\pi|+|\sin 60\pi|+|\sin 90\pi|)=0 [[/math]]

[[math]] I_5(f)=\frac{\pi}{5}(|\sin 0|+|\sin 24\pi|+|\sin 48\pi|+|\sin 72\pi|+|\sin 96\pi|)=0 [[/math]]

[[math]] I_6(f)=\frac{\pi}{6}(|\sin 0|+|\sin 20\pi|+|\sin 40\pi|+|\sin 60\pi|+|\sin 80\pi|+|\sin 100\pi|)=0 [[/math]]

[[math]] \vdots [[/math]]


Based on this evidence, we will conclude, obviously, that we have:

[[math]] \int_0^\pi f(x)dx=0 [[/math]]


With Monte Carlo, however, such things cannot happen. Indeed, since there are finitely many points [math]x\in[0,\pi][/math] having the property [math]\sin(120 x)=0[/math], a random point [math]x\in[0,\pi][/math] will have the property [math]|\sin(120 x)| \gt 0[/math], so Monte Carlo will give, at any [math]N\in\mathbb N[/math]:

[[math]] \int_0^\pi f(x)dx\simeq\frac{b-a}{N}\sum_{k=1}^Nf(x_i) \gt 0 [[/math]]


Again, this is something interesting, when practically computing integrals, either by hand, or by using a computer. So, let us record, as a complement to Conclusion 4.9: \begin{conclusion} Monte Carlo integration is smarter than Riemann integration, because the symmetries of the function can fool Riemann, but not Monte Carlo. \end{conclusion} All this is good to know, when computing integrals in practice, especially with a computer. Finally, here is one more useful interpretation of the integral:

Theorem

The integral of a function [math]f:[a,b]\to\mathbb R[/math] is given by

[[math]] \int_a^bf(x)dx=(b-a)\times E(f) [[/math]]
where [math]E(f)[/math] is the expectation of [math]f[/math], regarded as random variable.


Show Proof

This is just some sort of fancy reformulation of Theorem 4.8, the idea being that what we can “expect” from a random variable is of course its average. We will be back to this later in this book, when systematically discussing probability theory.

4b. Riemann sums

Our purpose now will be to understand which functions [math]f:\mathbb R\to\mathbb R[/math] are integrable, and how to compute their integrals. For this purpose, the Riemann formula in Theorem 4.6 will be our favorite tool. Let us begin with some theory. We first have:

Theorem

The following functions are integrable:

  • The piecewise continuous functions.
  • The piecewise monotone functions.


Show Proof

This is indeed something quite standard, as follows:


(1) It is enough to prove the first assertion for a function [math]f:[a,b]\to\mathbb R[/math] which is continuous, and our claim here is that this follows from the uniform continuity of [math]f[/math]. To be more precise, given [math]\varepsilon \gt 0[/math], let us choose [math]\delta \gt 0[/math] such that the following happens:

[[math]] |x-y| \lt \delta\implies|f(x)-f(y)| \lt \varepsilon [[/math]]


In order to prove the result, let us pick two divisions of [math][a,b][/math], as follows:

[[math]] I=[a=a_1 \lt a_2 \lt \ldots \lt a_n=b] [[/math]]

[[math]] I'=[a=a_1' \lt a_2' \lt \ldots \lt a_m'=b] [[/math]]


Our claim, which will prove the result, is that if these divisions are sharp enough, of resolution [math] \lt \delta/2[/math], then the associated Riemann sums [math]\Sigma_I(f),\Sigma_{I'}(f)[/math] are close within [math]\varepsilon[/math]:

[[math]] a_{i+1}-a_i \lt \frac{\delta}{2}\ ,\ a_{i+1}'-a_i' \lt \delta_2\ \implies\ \big|\Sigma_I(f)-\Sigma_{I'}(f)\big| \lt \varepsilon [[/math]]


(2) In order to prove this claim, let us denote by [math]l[/math] the length of the intervals on the real line. Our assumption is that the lengths of the divisions [math]I,I'[/math] satisfy:

[[math]] l\big([a_i,a_{i+1}]\big) \lt \frac{\delta}{2}\quad,\quad l\big([a_i',a_{i+1}']\big) \lt \frac{\delta}{2} [[/math]]


Now let us intersect the intervals of our divisions [math]I,I'[/math], and set:

[[math]] l_{ij}=l\big([a_i,a_{i+1}]\cap[a_j',a_{j+1}']\big) [[/math]]


The difference of Riemann sums that we are interested in is then given by:

[[math]] \begin{eqnarray*} \Big|\Sigma_I(f)-\Sigma_{I'}(f)\Big| &=&\left|\sum_{ij}l_{ij}f(a_i)-\sum_{ij}l_{ij}f(a_j')\right|\\ &=&\left|\sum_{ij}l_{ij}(f(a_i)-f(a_j'))\right| \end{eqnarray*} [[/math]]


(3) Now let us estimate [math]f(a_i)-f(a_j')[/math]. Since in the case [math]l_{ij}=0[/math] we do not need this estimate, we can assume [math]l_{ij} \gt 0[/math]. Now by remembering what the definition of the numbers [math]l_{ij}[/math] was, we conclude that we have at least one point [math]x\in\mathbb R[/math] satisfying:

[[math]] x\in [a_i,a_{i+1}]\cap[a_j',a_{j+1}'] [[/math]]


But then, by using this point [math]x[/math] and our assumption on [math]I,I'[/math] involving [math]\delta[/math], we get:

[[math]] \begin{eqnarray*} |a_i-a_j'| &\leq&|a_i-x|+|x-a_j'|\\ &\leq&\frac{\delta}{2}+\frac{\delta}{2}\\ &=&\delta \end{eqnarray*} [[/math]]


Thus, according to our definition of [math]\delta[/math] from (1), in relation to [math]\varepsilon[/math], we get:

[[math]] |f(a_i)-f(a_j')| \lt \varepsilon [[/math]]


(4) But this is what we need, in order to finish. Indeed, with the estimate that we found, we can finish the computation started in (2), as follows:

[[math]] \begin{eqnarray*} \Big|\Sigma_I(f)-\Sigma_{I'}(f)\Big| &=&\left|\sum_{ij}l_{ij}(f(a_i)-f(a_j'))\right|\\ &\leq&\varepsilon\sum_{ij}l_{ij}\\ &=&\varepsilon(b-a) \end{eqnarray*} [[/math]]


Thus our two Riemann sums are close enough, provided that they are both chosen to be fine enough, and this finishes the proof of the first assertion.


(5) Regarding now the second assertion, this is something more technical, that we will not really need in what follows. We will leave the proof here, which uses similar ideas to those in the proof of (1) above, namely subdivisions and estimates, as an exercise.

Going ahead with more theory, let us establish some abstract properties of the integration operation. We already know from Proposition 4.5 that the integrals behave well with respect to sums and multiplication by scalars. Along the same lines, we have:

Theorem

The integrals behave well with respect to taking limits,

[[math]] \int_a^b\left(\lim_{n\to\infty}f_n(x)\right)dx=\lim_{n\to\infty}\int_a^bf_n(x)dx [[/math]]
and with respect to taking infinite sums as well,

[[math]] \int_a^b\left(\sum_{n=0}^\infty f_n(x)\right)dx=\sum_{n=0}^\infty\int_a^bf_n(x)dx [[/math]]
with both these formulae being valid, undwer mild assumptions.


Show Proof

This is something quite standard, by using the general theory developed in chapter 3 for the sequences and series of functions. To be more precise, (1) follows by using the material there, via Riemann sums, and then (2) follows as a particular case of (1). We will leave the clarification of all this as an instructive exercise.

Finally, still at the general level, let us record as well the following result:

Theorem

Given a continuous function [math]f:[a,b]\to\mathbb R[/math], we have

[[math]] \exists c\in[a,b]\quad,\quad \int_a^bf(x)dx=(b-a)f(c) [[/math]]
with this being called mean value property.


Show Proof

Our claim is that this follows from the following trivial estimate:

[[math]] \min(f)\leq f\leq\max(f) [[/math]]


Indeed, by integrating this over [math][a,b][/math], we obtain the following estimate:

[[math]] (b-a)\min(f)\leq\int_a^bf(x)dx\leq(b-a)\max(f) [[/math]]


Now observe that this latter estimate can be written as follows:

[[math]] \min(f)\leq\frac{\int_a^bf(x)dx}{b-a}\leq\max(f) [[/math]]


Since [math]f[/math] must takes all values on [math][\min(f),\max(f)][/math], we get a [math]c\in[a,b][/math] such that:

[[math]] \frac{\int_a^bf(x)dx}{b-a}=f(c) [[/math]]


Thus, we are led to the conclusion in the statement.

At the level of examples now, let us first look at the simplest functions that we know, namely the power functions [math]f(x)=x^p[/math]. However, things here are tricky, as follows:

Theorem

We have the integration formula

[[math]] \int_a^bx^pdx=\frac{b^{p+1}-a^{p+1}}{p+1} [[/math]]
valid at [math]p=0,1,2,3[/math].


Show Proof

This is something quite tricky, the idea being as follows:


(1) By linearity we can assume that our interval [math][a,b][/math] is of the form [math][0,c][/math], and the formula that we want to establish is as follows:

[[math]] \int_0^cx^pdx=\frac{c^{p+1}}{p+1} [[/math]]


(2) We can further assume [math]c=1[/math], and by expressing the left term as a Riemann sum, we are in need of the following estimate, in the [math]N\to\infty[/math] limit:

[[math]] 1^p+2^p+\ldots+N^p\simeq \frac{N^{p+1}}{p+1} [[/math]]


(3) So, let us try to prove this. At [math]p=0[/math], obviously nothing to do, because we have the following formula, which is exact, and which proves our estimate:

[[math]] 1^0+2^0+\ldots+N^0=N [[/math]]


(4) At [math]p=1[/math] now, we are confronted with a well-known question, namely the computation of [math]1+2+\ldots+N[/math]. But this is simplest done by arguing that the average of the numbers [math]1,2,\ldots,N[/math] being the number in the middle, we have:

[[math]] \frac{1+2+\ldots+N}{N}=\frac{N+1}{2} [[/math]]


Thus, we obtain the following formula, which again solves our question:

[[math]] 1+2+\ldots+N=\frac{N(N+1)}{2}\simeq\frac{N^2}{2} [[/math]]


(5) At [math]p=2[/math] now, go compute [math]1^2+2^2+\ldots+N^2[/math]. This is not obvious at all, so as a preliminary here, let us go back to the case [math]p=1[/math], and try to find a new proof there, which might have some chances to extend at [math]p=2[/math]. The trick is to use 2D geometry. Indeed, consider the following picture, with stacks going from 1 to [math]N[/math]:

[[math]] \xymatrix@R=0pt@C=0pt{ &&&&\square\\ &&&&\vdots\\ &&\square&\ldots&\square\\ &\square&\square&\ldots&\square\\ \square&\square&\square&\ldots&\square } [[/math]]


Now if we take two copies of this, and put them one on the top of the other, with a twist, in the obvious way, we obtain a rectangle having size [math]N\times(N+1)[/math]. Thus:

[[math]] 2(1+2+\ldots+N)=N(N+1) [[/math]]


But this gives the same formula as before, solving our question, namely:

[[math]] 1+2+\ldots+N=\frac{N(N+1)}{2}\simeq\frac{N^2}{2} [[/math]]


(6) Armed with this new method, let us attack now the case [math]p=2[/math]. Here we obviously need to do some 3D geometry, namely taking the picture [math]P[/math] formed by a succession of solid squares, having sizes [math]1\times 1[/math], [math]2\times2[/math], [math]3\times3[/math], and so on up to [math]N\times N[/math]. Some quick thinking suggests that stacking 3 copies of [math]P[/math], with some obvious twists, will lead us to a parallelepiped. But this is not exactly true, and some further thinking shows that what we have to do is to add 3 more copies of [math]P[/math], leading to the following formula:

[[math]] 1^2+2^2+\ldots+N^2=\frac{N(N+1)(2N+1)}{6} [[/math]]


Or at least, that's how the legend goes. In practice, the above formula holds indeed, and you can check it for instance by recurrence, and this solves our problem:

[[math]] 1^2+2^2+\ldots+N^2\simeq\frac{2N^3}{6}=\frac{N^3}{3} [[/math]]



(7) At [math]p=3[/math] now, the legend goes that by deeply thinking in 4D we are led to the following formula, a bit as in the cases [math]p=1,2[/math], explained above:

[[math]] 1^3+2^3+\ldots+N^3=\left(\frac{N(N+1)}{2}\right)^2 [[/math]]


Alternatively, assuming that the gods of combinatorics are with us, we can see right away the following formula, which coupled with (4) gives the result:

[[math]] 1^3+2^3+\ldots+N^3=(1+2+\ldots+N)^2 [[/math]]


In any case, in practice, the above formula holds indeed, and you can check it for instance by recurrence, and this solves our problem:

[[math]] 1^3+2^3+\ldots+N^3\simeq\frac{N^4}{4} [[/math]]


(8) Thus, good news, we proved our theorem. Of course, I can hear you screaming, that what about [math]p=4[/math] and higher. But the thing is that, by a strange twist of fate, there is no exact formula for [math]1^p+2^p+\ldots+N^p[/math], at [math]p=4[/math] and higher. Thus, game over.

What happened above, with us unable to integrate [math]x^p[/math] at [math]p=4[/math] and higher, not to mention the exponents [math]p\in\mathbb R-\mathbb N[/math] that we have not even dared to talk about, is quite annoying. As a conclusion to all this, however, let us formulate: \begin{conjecture} We have the following estimate,

[[math]] 1^p+2^p+\ldots+N^p\simeq \frac{N^{p+1}}{p+1} [[/math]]

and so, by Riemann sums, we have the following integration formula,

[[math]] \int_a^bx^pdx=\frac{b^{p+1}-a^{p+1}}{p+1} [[/math]]

valid for any exponent [math]p\in\mathbb N[/math], and perhaps for some other [math]p\in\mathbb R[/math]. \end{conjecture} We will see later that this conjecture is indeed true, and with the exact details regarding the exponents [math]p\in\mathbb R-\mathbb N[/math] too. Now, instead of struggling with this, let us look at some other functions, which are not polynomial. And here, as good news, we have:

Theorem

We have the following integration formula,

[[math]] \int_a^be^xdx=e^b-e^a [[/math]]
valid for any two real numbers [math]a \lt b[/math].


Show Proof

This follows indeed from the Riemann integration formula, because:

[[math]] \begin{eqnarray*} \int_a^be^xdx &=&\lim_{N\to\infty}\frac{e^a+e^{a+(b-a)/N}+e^{a+2(b-a)/N}+\ldots+e^{a+(N-1)(b-a)/N}}{N}\\ &=&\lim_{N\to\infty}\frac{e^a}{N}\cdot\left(1+e^{(b-a)/N}+e^{2(b-a)/N}+\ldots+e^{(N-1)(b-a)/N}\right)\\ &=&\lim_{N\to\infty}\frac{e^a}{N}\cdot\frac{e^{b-a}-1}{e^{(b-a)/N}-1}\\ &=&(e^b-e^a)\lim_{N\to\infty}\frac{1}{N(e^{(b-a)/N}-1)}\\ &=&e^b-e^a \end{eqnarray*} [[/math]]


Thus, we are led to the conclusion in the statement.

4c. Advanced results

The problem is now, what to do with what we have, namely Conjecture 4.16 and Theorem 4.17. Not obvious, so stuck, and time to ask the cat. And cat says:

\begin{cat} Summing the infinitesimals of the rate of change of the function should give you the global change of the function. Obvious. \end{cat} Which is quite puzzling, usually my cat is quite helpful. Guess he must be either a reincarnation of Newton or Leibnitz, these gentlemen used to talk like that, or that I should take care at some point of my garden, remove catnip and other weeds.


This being said, wait. There is suggestion to connect integrals and derivatives, and this is in fact what we have, coming from Conjecture 4.16 and Theorem 4.17, due to:

[[math]] \left(\frac{x^{p+1}}{p+1}\right)'=x^p\quad,\quad (e^x)'=e^x [[/math]]


So, eureka, we have our idea, thanks cat. Moving ahead now, following this idea, we first have the following result, called fundamental theorem of calculus:

Theorem

Given a continuous function [math]f:[a,b]\to\mathbb R[/math], if we set

[[math]] F(x)=\int_a^xf(s)ds [[/math]]
then [math]F'=f[/math]. That is, the derivative of the integral is the function itself.


Show Proof

This follows from the Riemann integration picture, and more specifically, from the mean value property from Theorem 4.14. Indeed, we have:

[[math]] \frac{F(x+t)-F(x)}{t}=\frac{1}{t}\int_x^{x+t}f(x)dx [[/math]]


On the other hand, our function [math]f[/math] being continuous, by using the mean value property from Theorem 4.14, we can find a number [math]c\in[x,x+t][/math] such that:

[[math]] \frac{1}{t}\int_x^{x+t}f(x)dx=f(x) [[/math]]


Thus, putting our formulae together, we conclude that we have:

[[math]] \frac{F(x+t)-F(x)}{t}=f(c) [[/math]]


Now with [math]t\to0[/math], no matter how the number [math]c\in[x,x+t][/math] varies, one thing that we can be sure about is that we have [math]c\to x[/math]. Thus, by continuity of [math]f[/math], we obtain:

[[math]] \lim_{t\to0}\frac{F(x+t)-F(x)}{t}=f(x) [[/math]]


But this means exactly that we have [math]F'=f[/math], and we are done.

We have as well the following result, which is something equivalent, and a hair more beautiful, also called fundamental theorem of calculus:

Theorem

Given a function [math]F:\mathbb R\to\mathbb R[/math], we have

[[math]] \int_a^bF'(x)dx=F(b)-F(a) [[/math]]
for any interval [math][a,b][/math].


Show Proof

As already mentioned, this is something which follows from Theorem 4.19, and is in fact equivalent to it. Indeed, consider the following function:

[[math]] G(s)=\int_a^sF'(x)dx [[/math]]


By using Theorem 4.19 we have [math]G'=F'[/math], and so our functions [math]F,G[/math] differ by a constant. But with [math]s=a[/math] we have [math]G(a)=0[/math], and so the constant is [math]F(a)[/math], and we get:

[[math]] F(s)=G(s)+F(a) [[/math]]


Now with [math]s=b[/math] this gives [math]F(b)=G(b)+F(a)[/math], which reads:

[[math]] F(b)=\int_a^bF'(x)dx+F(a) [[/math]]


Thus, we are led to the conclusion in the statement.

As a first illustration for all this, solving our previous problems, we have:

Theorem

We have the following integration formulae,

[[math]] \int_a^bx^pdx=\frac{b^{p+1}-a^{p+1}}{p+1}\quad,\quad \int_a^b\frac{1}{x}\,dx=\log\left(\frac{b}{a}\right) [[/math]]

[[math]] \int_a^b\sin x\,dx=\cos a-\cos b\quad,\quad \int_a^b\cos x\,dx=\sin b-\sin a [[/math]]

[[math]] \int_a^be^xdx=e^b-e^a\quad,\quad \int_a^b\log x\,dx=b\log b-a\log a-b+a [[/math]]
all obtained, in case you ever forget them, via the fundamental theorem of calculus.


Show Proof

We already know some of these formulae, but the best is to do everything, using the fundamental theorem of calculus. The computations go as follows:


(1) With [math]F(x)=x^{p+1}[/math] we have [math]F'(x)=px^p[/math], and we get, as desired:

[[math]] \int_a^bpx^p\,dx=b^{p+1}-a^{p+1} [[/math]]


(2) Observe first that the formula (1) does not work at [math]p=-1[/math]. However, here we can use [math]F(x)=\log x[/math], having as derivative [math]F'(x)=1/x[/math], which gives, as desired:

[[math]] \int_a^b\frac{1}{x}\,dx=\log b-\log a=\log\left(\frac{b}{a}\right) [[/math]]


(3) With [math]F(x)=\cos x[/math] we have [math]F'(x)=-\sin x[/math], and we get, as desired:

[[math]] \int_a^b-\sin x\,dx=\cos b-\cos a [[/math]]


(4) With [math]F(x)=\sin x[/math] we have [math]F'(x)=\cos x[/math], and we get, as desired:

[[math]] \int_a^b\cos x\,dx=\sin b-\sin a [[/math]]


(5) With [math]F(x)=e^x[/math] we have [math]F'(x)=e^x[/math], and we get, as desired:

[[math]] \int_a^be^x\,dx=e^b-e^a [[/math]]


(6) This is something more tricky. We are looking for a function satisfying:

[[math]] F'(x)=\log x [[/math]]


This does not look doable, but fortunately the answer to such things can be found on the internet. But, what if the internet connection is down? So, let us think a bit, and try to solve our problem. Speaking logarithm and derivatives, what we know is:

[[math]] (\log x)'=\frac{1}{x} [[/math]]


But then, in order to make appear [math]\log[/math] on the right, the idea is quite clear, namely multiplying on the left by [math]x[/math]. We obtain in this way the following formula:

[[math]] (x\log x)'=1\cdot\log x+x\cdot\frac{1}{x}=\log x+1 [[/math]]


We are almost there, all we have to do now is to substract [math]x[/math] from the left, as to get:

[[math]] (x\log x-x)'=\log x [[/math]]


But this this formula in hand, we can go back to our problem, and we get the result.

Getting back now to theory, inspired by the above, let us formulate:

Definition

Given a function [math]f[/math], we call primitive of [math]f[/math] any function [math]F[/math] satisfying:

[[math]] F'=f [[/math]]
We denote such primitives by [math]\int f[/math], and also call them indefinite integrals.

Observe that the primitives are unique up to an additive constant, in the sense that if [math]F[/math] is a primitive, then so is [math]F+c[/math], for any [math]c\in\mathbb R[/math], and conversely, if [math]F,G[/math] are two primitives, then we must have [math]G=F+c[/math], for some [math]c\in\mathbb R[/math], with this latter fact coming from a result from chapter 3, saying that the derivative vanishes when the function is constant.


As for the convention at the end, [math]F=\int f[/math], this comes from the fundamental theorem of calculus, which can be written as follows, by using this convention:

[[math]] \int_a^bf(x)dx=\left(\int f\right)(b)-\left(\int f\right)(a) [[/math]]


By the way, observe that there is no contradiction here, coming from the indeterminacy of [math]\int f[/math]. Indeed, when adding a constant [math]c\in\mathbb R[/math] to the chosen primitive [math]\int f[/math], when conputing the above difference the [math]c[/math] quantities will cancel, and we will obtain the same result.


As an application, we can reformulate Theorem 4.21 in a more digest form, as follows:

Theorem

We have the following formulae for primitives,

[[math]] \int x^p=\frac{x^{p+1}}{p+1}\quad,\quad\int\frac{1}{x}=\log x [[/math]]

[[math]] \int\sin x=-\cos x\quad,\quad \int\cos x=\sin x [[/math]]

[[math]] \int e^x=e^x\quad,\quad\int\log x=x\log x-x [[/math]]
allowing us to compute the corresponding definite integrals too.


Show Proof

Here the various formulae in the statement follow from Theorem 4.21, or rather from the proof of Theorem 4.21, or even from chapter 3, for most of them, and the last assertion comes from the integration formula given after Definition 4.22.

Getting back now to theory, we have the following key result:

Theorem

We have the formula

[[math]] \int f'g+\int fg'=fg [[/math]]
called integration by parts.


Show Proof

This follows by integrating the Leibnitz formula, namely:

[[math]] (fg)'=f'g+fg' [[/math]]


Indeed, with our convention for primitives, this gives the formula in the statement.

It is then possible to pass to usual integrals, and we obtain a formula here as well, as follows, also called integration by parts, with the convention [math][\varphi]_a^b=\varphi(b)-\varphi(a)[/math]:

[[math]] \int_a^bf'g+\int_a^bfg'=\Big[fg\Big]_a^b [[/math]]


In practice, the most interesting case is that when [math]fg[/math] vanishes on the boundary [math]\{a,b\}[/math] of our interval, leading to the following formula:

[[math]] \int_a^bf'g=-\int_a^bfg' [[/math]]


Examples of this usually come with [math][a,b]=[-\infty,\infty][/math], and more on this later. Now still at the theoretical level, we have as well the following result:

Theorem

We have the change of variable formula

[[math]] \int_a^bf(x)dx=\int_c^df(\varphi(t))\varphi'(t)dt [[/math]]
where [math]c=\varphi^{-1}(a)[/math] and [math]d=\varphi^{-1}(b)[/math].


Show Proof

This follows with [math]f=F'[/math], from the following differentiation rule, that we know from chapter 3, and whose proof is something elementary:

[[math]] (F\varphi)'(t)=F'(\varphi(t))\varphi'(t) [[/math]]


Indeed, by integrating between [math]c[/math] and [math]d[/math], we obtain the result.

As a main application now of our theory, in relation with advanced calculus, and more specifically with the Taylor formula from chapter 3, we have:

Theorem

Given a function [math]f:\mathbb R\to\mathbb R[/math], we have the formula

[[math]] f(x+t)=\sum_{k=0}^n\frac{f^{(k)}(x)}{k!}\,t^k+\int_x^{x+t}\frac{f^{(n+1)}(s)}{n!}(x+t-s)^n\,ds [[/math]]
called Taylor formula with integral formula for the remainder.


Show Proof

This is something which looks a bit complicated, so we will first do some verifications, and then we will go for the proof in general:


(1) At [math]n=0[/math] the formula in the statement is as follows, and certainly holds, due to the fundamental theorem of calculus, which gives [math]\int_x^{x+t}f'(s)ds=f(x+t)-f(x)[/math]:

[[math]] f(x+t)=f(x)+\int_x^{x+t}f'(s)ds [[/math]]


(2) At [math]n=1[/math], the formula in the statement becomes more complicated, as follows:

[[math]] f(x+t)=f(x)+f'(x)t+\int_x^{x+t}f''(s)(x+t-s)ds [[/math]]


As a first observation, this formula holds indeed for the linear functions, where we have [math]f(x+t)=f(x)+f'(x)t[/math], and [math]f''=0[/math]. So, let us try [math]f(x)=x^2[/math]. Here we have:

[[math]] f(x+t)-f(x)-f'(x)t=(x+t)^2-x^2-2xt=t^2 [[/math]]


On the other hand, the integral remainder is given by the same formula, namely:

[[math]] \begin{eqnarray*} \int_x^{x+t}f''(s)(x+t-s)ds &=&2\int_x^{x+t}(x+t-s)ds\\ &=&2t(x+t)-2\int_x^{x+t}sds\\ &=&2t(x+t)-((x+t)^2-x^2)\\ &=&2tx+2t^2-2tx-t^2\\ &=&t^2 \end{eqnarray*} [[/math]]


(3) Still at [math]n=1[/math], let us try now to prove the formula in the statement, in general. Since what we have to prove is an equality, this cannot be that hard, and the first thought goes towards differentiating. But this method works indeed, and we obtain the result.


(4) In general, the proof is similar, by differentiating, the computations being similar to those at [math]n=1[/math], and we will leave this as an instructive exercise.

So long for basic integration theory. As a first concrete application now, we can compute all sorts of areas and volumes. Normally such computations are the business of multivariable calculus, and we will be back to this later, but with the technology that we have so far, we can do a number of things. As a first such computation, we have:

Proposition

The area of an ellipsis, given by the equation

[[math]] \left(\frac{x}{a}\right)^2+\left(\frac{y}{b}\right)^2=1 [[/math]]
with [math]a,b \gt 0[/math] being half the size of a box containing the ellipsis, is [math]A=\pi ab[/math].


Show Proof

The idea is that of cutting the ellipsis into vertical slices. First observe that, according to our equation [math](x/a)^2+(y/b)^2=1[/math], the [math]x[/math] coordinate can range as follows:

[[math]] x\in[-a,a] [[/math]]


For any such [math]x[/math], the other coordinate [math]y[/math], satisfying [math](x/a)^2+(y/b)^2=1[/math], is given by:

[[math]] y=\pm b\sqrt{1-\frac{x^2}{a^2}} [[/math]]


Thus the length of the vertical ellipsis slice at [math]x[/math] is given by the following formula:

[[math]] l(x)=2b\sqrt{1-\frac{x^2}{a^2}} [[/math]]


We conclude from this discussion that the area of the ellipsis is given by:

[[math]] \begin{eqnarray*} A &=&2b\int_{-a}^a\sqrt{1-\frac{x^2}{a^2}}\,dx\\ &=&\frac{4b}{a}\int_0^a\sqrt{a^2-x^2}\,dx\\ &=&4ab\int_0^1\sqrt{1-y^2}\,dy\\ &=&4ab\cdot\frac{\pi}{4}\\ &=&\pi ab \end{eqnarray*} [[/math]]


Finally, as a verification, for [math]a=b=1[/math] we get [math]A=\pi[/math], as we should.

Moving now to 3D, as an obvious challenge here, we can try to compute the volume of the sphere. This can be done a bit as for the ellipsis, the answer being as follows:

Theorem

The volume of the unit sphere is given by:

[[math]] V=\frac{4\pi}{3} [[/math]]
More generally, the volume of the sphere of radius [math]R[/math] is [math]V=4\pi R^3/3[/math].


Show Proof

We proceed a bit as for the ellipsis. The equation of the sphere is:

[[math]] x^2+y^2+z^2=1 [[/math]]


Thus, the range of the first coordinate [math]x[/math] is as follows:

[[math]] x\in[-1,1] [[/math]]


Now when this first coordinate [math]x[/math] is fixed, the other coordinates [math]y,z[/math] vary on a circle, given by the equation [math]y^2+z^2=1-x^2[/math], and so having radius as follows:

[[math]] r(x)=\sqrt{1-x^2} [[/math]]


Thus, the vertical slice of our sphere at [math]x[/math] has area as follows:

[[math]] a(x)=\pi r(x)^2=\pi(1-x^2) [[/math]]


We conclude from this discussion that the area of the sphere is given by:

[[math]] \begin{eqnarray*} A &=&\pi\int_{-1}^11-x^2\,dx\\ &=&\pi\int_{-1}^1\left(x-\frac{x^3}{3}\right)'dx\\ &=&\pi\left[\left(1-\frac{1}{3}\right)-\left(-1+\frac{1}{3}\right)\right]\\ &=&\pi\left(\frac{2}{3}+\frac{2}{3}\right)\\ &=&\frac{4\pi}{3} \end{eqnarray*} [[/math]]


Finally, the last assertion is clear too, by multiplying everything by [math]R[/math], which amounts in multiplying the final result of our volume computation by [math]R^3[/math].

As another application of our integration methods, we can now solve the 1D wave equation. In order to explain this, we will need a standard calculus result, as follows:

Proposition

The derivative of a function of type

[[math]] \varphi(x)=\int_{g(x)}^{h(x)}f(s)ds [[/math]]
is given by the formula [math]\varphi'(x)=f(h(x))h'(x)-f(g(x))g'(x)[/math].


Show Proof

Consider a primitive of the function that we integrate, [math]F'=f[/math]. We have:

[[math]] \begin{eqnarray*} \varphi(x) &=&\int_{g(x)}^{h(x)}f(s)ds\\ &=&\int_{g(x)}^{h(x)}F'(s)ds\\ &=&F(h(x))-F(g(x)) \end{eqnarray*} [[/math]]


By using now the chain rule for derivatives, we obtain from this:

[[math]] \begin{eqnarray*} \varphi'(x) &=&F'(h(x))h'(x)-F'(g(x))g'(x)\\ &=&f(h(x))h'(x)-f(g(x))g'(x) \end{eqnarray*} [[/math]]


Thus, we are led to the formula in the statement.

Now back to the 1D waves, the result here, due to d'Alembert, is as follows:

Theorem

The solution of the 1D wave equation [math]\ddot{\varphi}=v^2\varphi''[/math] with initial value conditions [math]\varphi(x,0)=f(x)[/math] and [math]\dot{\varphi}(x,0)=g(x)[/math] is given by the d'Alembert formula:

[[math]] \varphi(x,t)=\frac{f(x-vt)+f(x+vt)}{2}+\frac{1}{2v}\int_{x-vt}^{x+vt}g(s)ds [[/math]]
Moreover, in the context of our previous lattice model discretizations, what happens is more or less that the above d'Alembert integral gets computed via Riemann sums.


Show Proof

There are several things going on here, the idea being as follows:


(1) Let us first check that the d'Alembert solution is indeed a solution of the wave equation [math]\ddot{\varphi}=v^2\varphi''[/math]. The first time derivative is computed as follows:

[[math]] \dot{\varphi}(x,t)=\frac{-vf'(x-vt)+vf'(x+vt)}{2}+\frac{1}{2v}(vg(x+vt)+vg(x-vt)) [[/math]]


The second time derivative is computed as follows:

[[math]] \ddot{\varphi}(x,t)=\frac{v^2f''(x-vt)+v^2f(x+vt)}{2}+\frac{vg'(x+vt)-vg'(x-vt)}{2} [[/math]]


Regarding now space derivatives, the first one is computed as follows:

[[math]] \varphi'(x,t)=\frac{f'(x-vt)+f'(x+vt)}{2}+\frac{1}{2v}(g'(x+vt)-g'(x-vt)) [[/math]]


As for the second space derivative, this is computed as follows:

[[math]] \varphi''(x,t)=\frac{f''(x-vt)+f''(x+vt)}{2}+\frac{g''(x+vt)-g''(x-vt)}{2v} [[/math]]


Thus we have indeed [math]\ddot{\varphi}=v^2\varphi''[/math]. As for the initial conditions, [math]\varphi(x,0)=f(x)[/math] is clear from our definition of [math]\varphi[/math], and [math]\dot{\varphi}(x,0)=g(x)[/math] is clear from our above formula of [math]\dot{\varphi}[/math].


(2) Conversely now, we can simply solve our equation, which among others will doublecheck the computations in (1). Let us make the following change of variables:

[[math]] \xi=x-vt\quad,\quad\eta=x+vt [[/math]]


With this change of variables, which is quite tricky, mixing space and time variables, our wave equation [math]\ddot{\varphi}=v^2\varphi''[/math] reformulates in a very simple way, as follows:

[[math]] \frac{d^2\varphi}{d\xi d\eta}=0 [[/math]]


But this latter equation tells us that our new [math]\xi,\eta[/math] variables get separated, and we conclude from this that the solution must be of the following special form:

[[math]] \varphi(x,t)=F(\xi)+G(\eta)=F(x-vt)+G(x+vt) [[/math]]


Now by taking into account the intial conditions [math]\varphi(x,0)=f(x)[/math] and [math]\dot{\varphi}(x,0)=g(x)[/math], and then integrating, we are led to the d'Alembert formula. Finally, in what regards the last assertion, we will leave the study here as an instructive exercise.

4d. Some probability

As another application of the integration theory developed above, let us develop now some theoretical probability theory. You probably know, from real life, what probability is. But in practice, when trying to axiomatize this, in mathematical terms, things can be quite tricky. So, here comes our point, the definition saving us is as follows:

Definition

A probability density is a function [math]\varphi:\mathbb R\to\mathbb R[/math] satisfying

[[math]] \varphi\geq0\qquad,\qquad \int_\mathbb R\varphi(x)dx=1 [[/math]]
with the convention that we allow Dirac masses, [math]\delta_x[/math] with [math]x\in\mathbb R[/math], as components of [math]\varphi[/math].

To be more precise, in what regards the convention at the end, which is something of physics flavor, this states that our density function [math]\varphi:\mathbb R\to\mathbb R[/math] must be a combination as follows, with [math]\psi:\mathbb R\to\mathbb R[/math] being a usual function, and with [math]\alpha_i,x_i\in\mathbb R[/math]:

[[math]] \varphi=\psi+\sum_i\alpha_i\delta_{x_i} [[/math]]


Assuming that [math]x_i[/math] are distinct, and with the usual convention that the Dirac masses integrate up to 1, the conditions on our density function [math]\varphi:\mathbb R\to\mathbb R[/math] are as follows:

[[math]] \psi\geq 0\quad,\quad \alpha_i\geq0\quad,\quad \int_\mathbb R\psi(x)dx+\sum_i\alpha_i=1 [[/math]]


Observe the obvious relation with intuitive probability theory, where the probability for something to happen is always positive, [math]P\geq0[/math], and where the overall probability for something to happen, with this meaning for one of the possible events to happen, is of course [math]\Sigma P=1[/math], and this because life goes on, and something must happen, right.


In short, what we are proposing with Definition 4.31 is some sort of continuous generalization of basic probability theory, coming from coins, dice and cards, that you surely know. Moving now ahead, let us formulate, as a continuation of Definition 4.31:

Definition

We say that a random variable [math]f[/math] follows the density [math]\varphi[/math] if

[[math]] P(f\in[a,b])=\int_a^b\varphi(x)dx [[/math]]
holds, for any interval [math][a,b]\subset\mathbb R[/math].

With this, we are now one step closer to what we know from coins, dice, cards and so on. For instance when rolling a die, the corresponding density is as follows:

[[math]] \varphi=\frac{1}{6}\left(\delta_1+\delta_2+\delta_3+\delta_4+\delta_5+\delta_6\right) [[/math]]



In what regards now the random variables [math]f[/math], described as above by densities [math]\varphi[/math], the first questions regard their mean and variance, constructed as follows:

Definition

Given a random variable [math]f[/math], with probability density [math]\varphi[/math]:

  • Its mean is the quantity [math]M=\int_\mathbb Rx\varphi(x)\,dx[/math].
  • More generally, its [math]k[/math]-th moment is [math]M_k=\int_\mathbb Rx^k\varphi(x)\,dx[/math].
  • Its variance is the quantity [math]V=M_2-M_1^2[/math].

Before going further, with more theory and examples, let us observe that, in both Definition 4.32 and Definition 4.33, what really matters is not the density [math]\varphi[/math] itself, but rather the related quantity [math]\mu=\varphi(x)dx[/math]. So, let us upgrade our formalism, as follows:

Definition (upgrade)

A real probability measure is a quantity of the following type, with [math]\psi\geq 0[/math], [math]\alpha_i\geq0[/math] and [math]x_i\in\mathbb R[/math], satisfying [math]\int_\mathbb R\psi(x)dx+\sum_i\alpha_i=1[/math]:

[[math]] \mu=\psi(x)dx+\sum_i\alpha_i\delta_{x_i} [[/math]]
We say that a random variable [math]f[/math] follows [math]\mu[/math] when [math]P(f\in[a,b])=\int_a^bd\mu(x)[/math]. In this case

[[math]] M_k=\int_\mathbb Rx^k d\mu(x) [[/math]]
are called moments of [math]f[/math], and [math]M=M_1[/math] and [math]V=M_2-M_1^2[/math] are called mean, and variance.

In practice now, let us look for some illustrations for this. The simplest random variables are those following discrete laws, [math]\psi=0[/math], and as a basic example here, when flipping a coin and being rewarded [math]\$0[/math] for heads, and [math]\$1[/math] for tails, the corresponding law is [math]\mu=\frac{1}{2}(\delta_0+\delta_1)[/math]. More generally, playing the same game with a biased coin, which lands on heads with probability [math]p\in(0,1)[/math], leads to the following law, called Bernoulli law:

[[math]] \mu=p\delta_0+(1-p)\delta_1 [[/math]]


Many more things can be said here, notably with a study of what happens when you play the game [math]n[/math] times in a row, leading to some sort of powers of the Bernoulli laws, called binomial laws. Skipping some discussion here, and getting straight to the point, the most important laws in discrete probability are the Poisson laws, constructed as follows:

Definition

The Poisson law of parameter [math]1[/math] is the following measure,

[[math]] p_1=\frac{1}{e}\sum_{k\in\mathbb N}\frac{\delta_k}{k!} [[/math]]
and more generally, the Poisson law of parameter [math]t \gt 0[/math] is the following measure,

[[math]] p_t=e^{-t}\sum_{k\in\mathbb N}\frac{t^k}{k!}\,\delta_k [[/math]]
with the letter “p” standing for Poisson.

Observe that our laws have indeed mass 1, as they should, and this due to:

[[math]] e^t=\sum_{k\in\mathbb N}\frac{t^k}{k!} [[/math]]


In general, the idea with the Poisson laws is that these appear a bit everywhere, in the real life, the reasons for this coming from the Poisson Limit Theorem (PLT). However, this theorem uses advanced calculus, and we will leave it for later. In the meantime, however, we can have some fun with moments, the result here being as follows:

Theorem

The moments of [math]p_1[/math] are the Bell numbers,

[[math]] M_k(p_1)=|P(k)| [[/math]]
where [math]P(k)[/math] is the set of partitions of [math]\{1,\ldots,k\}[/math]. More generally, we have

[[math]] M_k(p_t)=\sum_{\pi\in P(k)}t^{|\pi|} [[/math]]
for any [math]t \gt 0[/math], where [math]|.|[/math] is the number of blocks.


Show Proof

The moments of [math]p_1[/math] satisfy the following recurrence formula:

[[math]] \begin{eqnarray*} M_{k+1} &=&\frac{1}{e}\sum_r\frac{(r+1)^{k+1}}{(r+1)!}\\ &=&\frac{1}{e}\sum_r\frac{r^k}{r!}\left(1+\frac{1}{r}\right)^k\\ &=&\frac{1}{e}\sum_r\frac{r^k}{r!}\sum_s\binom{k}{s}r^{-s}\\ &=&\sum_s\binom{k}{s}\cdot\frac{1}{e}\sum_r\frac{r^{k-s}}{r!}\\ &=&\sum_s\binom{k}{s}M_{k-s} \end{eqnarray*} [[/math]]


With this done, let us try now to find a recurrence for the Bell numbers, [math]B_k=|P(k)|[/math]. Since a partition of [math]\{1,\ldots,k+1\}[/math] appears by choosing [math]s[/math] neighbors for [math]1[/math], among the [math]k[/math] numbers available, and then partitioning the [math]k-s[/math] elements left, we have:

[[math]] B_{k+1}=\sum_s\binom{k}{s}B_{k-s} [[/math]]


Since the initial values coincide, [math]M_1=B_1=1[/math] and [math]M_2=B_2=2[/math], we obtain by recurrence [math]M_k=B_k[/math], as claimed. Regarding now the law [math]p_t[/math] with [math]t \gt 0[/math], we have here a similar recurrence formula for the moments, as follows:

[[math]] \begin{eqnarray*} M_{k+1} &=&e^{-t}\sum_r\frac{t^{r+1}(r+1)^{k+1}}{(r+1)!}\\ &=&e^{-t}\sum_r\frac{t^{r+1}r^k}{r!}\left(1+\frac{1}{r}\right)^k\\ &=&e^{-t}\sum_r\frac{t^{r+1}r^k}{r!}\sum_s\binom{k}{s}r^{-s}\\ &=&\sum_s\binom{k}{s}\cdot e^{-t}\sum_r\frac{t^{r+1}r^{k-s}}{r!}\\ &=&t\sum_s\binom{k}{s}M_{k-s} \end{eqnarray*} [[/math]]


Regarding the initial values, the first moment of [math]p_t[/math] is given by:

[[math]] M_1 =e^{-t}\sum_r\frac{t^rr}{r!} =e^{-t}\sum_r\frac{t^r}{(r-1)!} =t [[/math]]


Now by using the above recurrence we obtain from this:

[[math]] M_2 =t\sum_s\binom{1}{s}M_{k-s} =t(1+t) =t+t^2 [[/math]]


On the other hand, some standard combinatorics, a bit as before at [math]t=1[/math], shows that the numbers in the statement [math]S_k=\sum_{\pi\in P(k)}t^{|\pi|}[/math] satisfy the same recurrence relation, and with the same initial values. Thus we have [math]M_k=S_k[/math], as claimed.

The above result looks quite exciting, making a link between subtle probability, and subtle combinatorics. We will be back to it, after learning more calculus.

General references

Banica, Teo (2024). "Calculus and applications". arXiv:2401.00911 [math.CO].