Tail Characteristics

Expected Loss and Mean Excess Loss

In probability theory, the expected value of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represents. For example, within an insurance context, the expected loss can be thought of as the average loss incurred by an insurer on a very large portfolio of policies sharing a common loss distribution (similar risk profile). Less roughly, the law of large numbers states that the arithmetic mean of the values almost surely converges to the expected value as the number of repetitions approaches infinity.

The expected value does not exist for random variables having some distributions with large "tails", such as the Cauchy distribution.^[1] For random variables such as these, the long-tails of the distribution prevent the sum/integral from converging. That being said, most loss models encountered in insurance implicitly assume finite expected losses.

The expected value is also known as the expectation, mathematical expectation, EV, average, mean value, mean, or first moment.

Expected Values

We give a brief mathematical review of expected values and provide some special formulas that apply for loss variables.

Univariate discrete random variable

Let [math]X[/math] be a discrete random variable taking values [math]x_1,x_2,\ldots[/math] with probabilities [math]p_1,p_2,\ldots[/math] respectively. Then the expected value of this random variable is the infinite sum

[[math]] \operatorname{E}[X] = \sum_{i=1}^\infty x_i\, p_i,[[/math]]

provided that this series converges absolutely (that is, the sum must remain finite if we were to replace all [math]x_i[/math]s with their absolute values). If this series does not converge absolutely, we say that the expected value of [math]X[/math] does not exist.

Univariate continuous random variable

If the probability distribution of [math]X[/math] admits a probability density function [math]f(x)[/math], then the expected value can be computed as

[[math]] \operatorname{E}[X] = \int_{-\infty}^\infty x f(x)\, \mathrm{d}x . [[/math]]

Properties

The expected value of a constant is equal to the constant itself; i.e., if [math]c[/math] is a constant, then [math]\operatorname{E}[c]=c[/math].

If [math]X[/math] and [math]Y[/math] are random variables such that [math]X \le Y[/math] almost surely, then [math]\operatorname{E}[X] \le \operatorname{E}[Y][/math].

The expected value operator (or expectation operator) [math]\operatorname{E}[\cdot][/math] is linear in the sense that

[[math]]\begin{align*} \operatorname{E}[X + c] &= \operatorname{E}[X] + c \\ \operatorname{E}[X + Y] &= \operatorname{E}[X] + \operatorname{E}[Y] \\ \operatorname{E}[aX] &= a \operatorname{E}[X] \end{align*}[[/math]]

Combining the results from previous three equations, we can see that

[[math]]\operatorname{E}[a X + b Y + c] = a \operatorname{E}[X] + b \operatorname{E}[Y] + c\,[[/math]]

for any two random variables [math]X[/math] and [math]Y[/math] and any real numbers [math]a[/math],[math]b[/math] and [math]c[/math].

Layer Cake Representation

When a continuous random variable [math]X[/math] takes only non-negative values, we can use the following formula for computing its expectation (even when the expectation is infinite):

[[math]] \operatorname{E}[X]=\int_0^\infty \operatorname{P}(X \ge x)\; \mathrm{d}x[[/math]]

Similarly, when a random variable takes only values in {0, 1, 2, 3, ...} we can use the following formula for computing its expectation:

[[math]] \operatorname{E}[X]=\sum\limits_{i=1}^\infty \operatorname{P}(X\geq i).[[/math]]

Residual Life Distribution

Suppose [math]X[/math] is a non-negative random variable which can be thought of as representing the lifetime for some entity of interest. A family of residual life distributions can be constructed by considering the conditional distribution of [math]X[/math] given that [math]X[/math] is beyond some level [math]d[/math],i.e., the distribution of lifetime given that death (failure) hasn't yet occurred at time [math]d[/math]:

[[math]] \begin{align} R_d(t) &= \operatorname{P}(X \leq d + t \mid X \gt d) \\ &= \frac{1 - S(t+d)}{S(d)} \end{align} [[/math]]

with [math]S(t)[/math] denoting the survival function for [math]X[/math] representing the probability that [math]X[/math] is greater than [math]t[/math] (the lifetime of [math]X[/math] is greater than [math]t[/math]).

Residual life distributions are relevant for insurance policies with deductibles. Since a claim is made when the loss to the insured is beyond the deductible, the loss to the insurer given that a claim was made is precisely the residual life distribution [math]R_d(t)[/math].

Mean Excess Loss Function

If [math]X[/math] represents loss to the insured with an insurance policy with a deductible [math]d[/math], then the expected loss to the insurer given that a claim was made is the mean excess loss function evaluated at [math]d[/math]:

[[math]] m(d) = \operatorname{E}[X-d \mid X \gt d] = \int_{0}^{\infty}\frac{S(t + d)}{S(d)} \,dt \,. [[/math]]

This function is also called the mean residual life function when [math]X[/math] is a general non-negative random variable. When the distribution of [math]X[/math] has a density say [math]f(x)[/math], then the mean excess loss function equals

[[math]] m(d) = \frac{\int_{d}^{\infty} (x-d) f(x) \, dx}{S(d)} \,. [[/math]]

Moments

In mathematics, a moment is a specific quantitative measure, used in both mechanics and statistics, of the shape of a set of points. If the points represent probability density, then the zeroth moment is the total probability (i.e. one), the first moment is the mean, the second central moment is the variance, the third moment is the skewness, and the fourth moment (with normalization and shift) is the kurtosis. The mathematical concept is closely related to the concept of moment in physics.

For a bounded distribution of mass or probability, the collection of all the moments (of all orders, from $0$ to $\infty$ ) uniquely determines the distribution.

Significance of the moments

The [math]n[/math]-th moment of a real-valued continuous function [math]f(x)[/math] of a real variable about a value [math]c[/math] is

[[math]]\mu_n=\int_{-\infty}^\infty (x - c)^n\,f(x)\,dx.[[/math]]

The moment of a function, without further explanation, usually refers to the above expression with [math]c[/math] = 0.

For the second and higher moments, the central moments (moments about the mean, with [math]c[/math] being the mean) are usually used rather than the moments about zero, because they provide clearer information about the distribution's shape.

The [math]n[/math]-th moment about zero of a probability density function [math]f(x)[/math] is the expected value of [math]X^n[/math] and is called a raw moment or crude moment.^[2] The moments about its mean [math]\mu[/math] are called central moments; these describe the shape of the function, independently of translation.

If [math]f[/math] is a probability density function, then the value of the integral above is called the [math]n[/math]-th moment of the probability distribution. More generally, if [math]f[/math] is a cumulative probability distribution function of any probability distribution, which may not have a density function, then the [math]n[/math]-th moment of the probability distribution is given by the Riemann–Stieltjes integral

[[math]]\mu'_n = \operatorname{E} \left [ X^n \right ] =\int_{-\infty}^\infty x^n\,dF(x)\,[[/math]]

where [math]X[/math] is a random variable that has this cumulative distribution [math]f[/math], and [math]\operatorname{E}[/math] is the expectation operator or mean.

When

[[math]]\operatorname{E}\left [\left |X^n \right | \right ] = \int_{-\infty}^\infty |x^n|\,dF(x) = \infty,[[/math]]

then the moment is said not to exist. If the [math]n[/math]-th moment about any point exists, so does the $([math]n[/math] - 1)$ -th moment (and thus, all lower-order moments) about every point.

The zeroth moment of any probability density function is 1, since the area under any probability_density_function must be equal to one.

Hazard Function

In actuarial science, hazard function represents the instantaneous rate of mortality at a certain age measured on an annualized basis.

Motivation and definition

In a life table, we consider the probability of a person dying from age [math]x[/math] to [math]x[/math] + 1, called [math]q_x[/math]. In the continuous case, we could also consider the conditional probability of a person who has attained age ([math]x[/math]) dying between ages [math]x[/math] and [math]x + \Delta x [/math], which is

[[math]]P_{x}(\Delta x)=P(x \lt X \lt \lt +\Delta\;x\mid\;X \gt x)=\frac{F_X(x+\Delta\;x)-F_X(x)}{(1-F_X(x))}[[/math]]

where [math]F_X(x)[/math] is the cumulative distribution function of the continuous age-at-death random variable, [math]X[/math]. As [math]\Delta x[/math] tends to zero, so does this probability in the continuous case. The approximate hazard rate is this probability divided by [math]\Delta x[/math]. If we let [math]\Delta x [/math] tend to zero, we get the hazard function, denoted by [math]h(x)[/math]:

[[math]]h(x)= \lim_{\Delta x \rightarrow 0} \frac{F_X(x+\Delta\;x)-F_X(x)}{\Delta x (1-F_X(x))} = \frac{F'_X(x)}{1-F_X(x)}[[/math]]

Since [math]f_X(x)=F'_X(x)[/math] is the probability density function of [math]X[/math], and [math]S(x)=1-F_X(x)[/math] is the survival function, the hazard rate can also be expressed variously as:

[[math]]h(x)=\frac{f_X(x)}{1-F_X(x)}=-\frac{S'(x)}{S(x)}=-{\frac{d}{dx}}\ln[S(x)].[[/math]]

To understand conceptually how the hazard rate operates within a population, consider that the ages, [math]x[/math], where the probability density function [math]f_X(x)[/math] is zero, there is no chance of dying. Thus the hazard rate at these ages is zero. The hazard rate [math]h(x)[/math] uniquely defines a probability density function [math]f_X(x)[/math].

The hazard rate [math]h(x)[/math] can be interpreted as the conditional density of failure at age [math]x[/math], while [math]f(x)[/math] is the unconditional density of failure at age [math]x[/math].^[3] The unconditional density of failure at age [math]x[/math] is the product of the probability of survival to age [math]x[/math], and the conditional density of failure at age [math]x[/math], given survival to age [math]x[/math].

This is expressed in symbols as

[[math]]h(x)S(x) = f_X(x)[[/math]]

or equivalently

[[math]]h(x) = \frac{f_X(x)}{S(x)}.[[/math]]

In many instances, it is also desirable to determine the survival probability function when the hazard rate is known. To do this, integrate the hazard rate over the interval [math]x[/math] to [math]x+t[/math].

[[math]] \int_{x}^{x+t} h(y) \, dy = \int_{x}^{x+t} -\frac{d}{dy} \ln[S(y)]\, dy [[/math]]

.

By the fundamental theorem of calculus, this is simply

[[math]] -\int_{x}^{x+t} h(y) \, dy = \ln[S(x + t)] - \ln[S(x)]. [[/math]]

Let us denote

[[math]] S_x(t) = \frac{S(x+t)}{S(x)}, [[/math]]

then taking the exponent to the base e, the survival probability of an individual of age [math]x[/math] in terms of the hazard rate is

[[math]] S_x(t) = \exp \left(-\int_x^{x+t}h(y)\, dy\, \right). [[/math]]

Examples

hazard rate	Survival function	Distribution
[math]h(y) = \lambda[/math]	[math]S_x(t) = e^{-\int_x^{x+t} \lambda dy} = e^{-\lambda t}[/math]	Exponential
[math]h(y) = \frac{y^{\alpha-1} e^{-y}}{\Gamma(\alpha) - \gamma(\alpha, y)}, [/math] where [math]\gamma(\alpha,y)[/math] is the lower incomplete gamma function		Gamma
[math] h(y) = \alpha \lambda^\alpha y^{\alpha-1},[/math] where [math]\alpha \geq 0[/math]	[math]S_x(t) = e^{-\int_x^{x+t}\mu(y) dy} = A(x) e^{ - (\lambda (x+t))^\alpha }, [/math] where [math]A(x) = e^{(\lambda x)^{\alpha}}[/math]	Weibull

Classification of the Tails

The tail of a loss distribution refers to the portion of the loss distribution corresponding to very large losses. The tail of a loss distribution is often classified as light or heavy. There are various ways of classifying the tails of a loss distribution based on the concepts that we have covered on this page:

Method	Light	Heavy
Moments	All moments exist: [math]\operatorname{E}[X^k] \lt \infty [/math] for all [math]k \gt 0[/math]	Only a finite number of moments exist: there exists [math]K\geq 0[/math] such that [math]\operatorname{E}[X^k] = \infty [/math] for all [math]k \geq K [/math]
Hazard Rate Function	The hazard rate function [math]h(x)[/math] is increasing: [math]h(x_1) \leq h(x_2) [/math] for any [math]x_1 \lt x_2 [/math]	The hazard rate function [math]h(x)[/math] is decreasing: [math]h(x_1) \geq h(x_2) [/math] for any [math]x_1 \lt x_2 [/math]
Mean Excess Function	The mean excess function [math]e(x)[/math] is decreasing: [math]e(x_1) \geq e(x_2) [/math] for any [math]x_1 \lt x_2 [/math]	The mean excess rate function [math]e(x)[/math] is increasing: [math]e(x_1) \leq e(x_2) [/math] for any [math]x_1 \lt x_2 [/math]

Similarly, we can compare the tails of two distributions using the same concepts used to classify them:

Method	Comparison
Limiting tail behavior	Compare the rate of decrease of the survival functions: if [math]\lim_{x \rightarrow \infty} \frac{S_2(x)}{S_1(x)} = 0 [/math] then the loss distribution corresponding to [math]S_2(x)[/math] has a lighter tail.
Hazard Rate Function	Compare the rate of increase/decrease: if [math]\lim_{x \rightarrow \infty} \frac{h_2(x)}{h_1(x)} = 0 [/math] then the loss distribution corresponding to [math]h_2(x)[/math] has a heavier tail.
Mean Excess Loss Function	Compare the rate of increase/decrease: if [math]\lim_{x \rightarrow \infty} \frac{h_2(x)}{h_1(x)} = 0 [/math] then the loss distribution corresponding to [math]h_2(x)[/math] has a lighter tail.

References

Richard W Hamming (1991). "Example 8.7–1 The Cauchy distribution". The art of probability for scientists and engineers. Addison-Wesley. p. 290 ff. ISBN 0-201-40686-1. Sampling from the Cauchy distribution and averaging gets you nowhere — one sample has the same distribution as the average of 1000 samples!
http://mathworld.wolfram.com/RawMoment.html Raw Moments at Math-world
R. Cunningham, T. Herzog, R. London (2008). Models for Quantifying Risk, 3rd Edition, Actex.

Wikipedia References

Wikipedia contributors. "Expected value". Wikipedia. Wikipedia. Retrieved 22 August 2022.
Wikipedia contributors. "Failure rate". Wikipedia. Wikipedia. Retrieved 22 August 2022.
Wikipedia contributors. "Moment (mathematics)". Wikipedia. Wikipedia. Retrieved 22 August 2022.

[Hamming2-1] Richard W Hamming (1991). "Example 8.7–1 The Cauchy distribution". The art of probability for scientists and engineers. Addison-Wesley. p. 290 ff. ISBN 0-201-40686-1. Sampling from the Cauchy distribution and averaging gets you nowhere — one sample has the same distribution as the average of 1000 samples!

[2] ttp://mathworld.wolfram.com/RawMoment.html Raw Moments at Math-world

[MQR-3] R. Cunningham, T. Herzog, R. London (2008). Models for Quantifying Risk, 3rd Edition, Actex.

[1]

[2]

[3]