Continuous Distributions
Normal Distribution
The normal (or Gaussian) distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known.^{[1]}^{[2]}
The normal distribution is useful because of the central limit theorem. In its most general form, under some conditions (which include finite variance), it states that averages of random variables independently drawn from independent distributions converge in distribution to the normal, that is, become normally distributed when the number of random variables is sufficiently large. Physical quantities that are expected to be the sum of many independent processes (such as measurement errors) often have distributions that are nearly normal.^{[3]} Moreover, many results and methods (such as propagation of uncertainty and least squares parameter fitting) can be derived analytically in explicit form when the relevant variables are normally distributed.
The normal distribution is sometimes informally called the bell curve. However, many other distributions are bell-shaped (such as the Cauchy, Student's t, and logistic distributions). The terms Gaussian function and Gaussian bell curve are also ambiguous because they sometimes refer to multiples of the normal distribution that cannot be directly interpreted in terms of probabilities.
The probability density of the normal distribution is:
Where:
- [math]\mu[/math] is mean or expectation of the distribution (and also its median and mode)
- [math]\sigma[/math] is standard deviation
- [math]\sigma^2[/math] is variance
A random variable with a Gaussian distribution is said to be normally distributed and is called a normal deviate.
Definition
Standard normal distribution
The simplest case of a normal distribution is known as the standard normal distribution. This is a special case when [math]μ=0[/math] and [math]σ=1[/math], and it is described by this probability density function:
The factor [math]1/\sqrt{2\pi}[/math] in this expression ensures that the total area under the curve [math]\phi(x)[/math] is equal to one.^{[4]} The ½ in the exponent ensures that the distribution has unit variance (and therefore also unit standard deviation). This function is symmetric around [math]x=0[/math], where it attains its maximum value [math]1/\sqrt{2\pi}[/math]; and has inflection points at +1 and −1.
General normal distribution
Every normal distribution is a version of the standard normal distribution whose domain has been stretched by a factor [math]\sigma[/math] (the standard deviation) and then translated by [math]\mu[/math] (the mean value):
The probability density must be scaled by [math]1/\sigma[/math] so that the integral is still 1.
If [math]Z[/math] is a standard normal deviate, then [math]X = Z\sigma + \mu [/math] will have a normal distribution with expected value [math]\mu[/math] and standard deviation [math]\sigma[/math]. Conversely, if [math]X[/math] is a general normal deviate, then [math]Z = X-\mu/\sigma [/math] will have a standard normal distribution.
Notation
The standard Gaussian distribution (with zero mean and unit variance) is often denoted with the Greek letter [math]\phi[/math] (phi).^{[5]} The alternative form of the Greek phi letter, [math]\varphi[/math], is also used quite often.
The normal distribution is also often denoted by [math]N(\mu,\sigma^2)[/math].^{[6]} Thus when a random variable [math]X[/math] is distributed normally with mean [math]\mu[/math] and variance [math]\sigma^2[/math], we write
Properties
The normal distribution is a subclass of the elliptical distributions. The normal distribution is symmetric about its mean, and is non-zero over the entire real line. As such it may not be a suitable model for variables that are inherently positive or strongly skewed, such as the weight of a person or the price of a share. Such variables may be better described by other distributions, such as the [[wikipedia:lognormal||log-normal distribution}} or the Pareto distribution.
The value of the normal distribution is practically zero when the value [math]x[/math] lies more than a few standard deviations away from the mean. Therefore, it may not be an appropriate model when one expects a significant fraction of outliers—values that lie many standard deviations away from the mean—and least squares and other statistical inference methods that are optimal for normally distributed variables often become highly unreliable when applied to such data. In those cases, a more heavy-tailed distribution should be assumed and the appropriate robust statistical inference methods applied.
The Gaussian distribution belongs to the family of stable distributions which are the attractors of sums of independent, identically distributed distributions whether or not the mean or variance is finite. Except for the Gaussian which is a limiting case, all stable distributions have heavy tails and infinite variance. It is one of the few distributions that are stable and that have probability density functions that can be expressed analytically, the others being the Cauchy distribution and the Lévy distribution.
Symmetries and derivatives
The normal distribution [math]f(x)[/math], with any mean [math]\mu[/math] and any positive deviation [math]\sigma[/math], has the following properties:
- It is symmetric around the point [math]x = \mu [/math], which is at the same time the mode, the median and the mean of the distribution and it divides the data in half.^{[7]}
- It is unimodal: its first derivative is positive for [math]x \lt \mu [/math], negative for [math] x\gt \mu [/math], and zero only at [math]x = \mu [/math].
- The area under the curve and over the x-axis is unity.
- Its density has two inflection points (where the second derivative of [math]f[/math] is zero and changes sign), located one standard deviation away from the mean, namely at [math]x = \mu - \sigma[/math] and [math]x = \mu + \sigma [/math].^{[7]}
- Its density is log-concave.^{[7]}
- Its density is infinitely differentiable, indeed supersmooth of order 2.^{[8]}
- Its second derivative [math]f^{''}(x)[/math] is equal to its derivative with respect to its variance [math]\sigma^2[/math].
Moments
If [math]X[/math] has a normal distribution, the moments exist and are finite for any [math]p[/math] whose real part is greater than −1. For any non-negative integer
[math]p[/math], the plain central moments are
Here [math]n!![/math] denotes the double factorial, that is, the product of every number from [math]n[/math] to 1 that has the same parity as [math]n[/math].
Order | Non-central moment | Central moment |
---|---|---|
1 | [math]\mu[/math] | 0 |
2 | [math]\mu[/math]^{2} + [math]\sigma[/math]^{2} | [math]\sigma[/math]^{ 2} |
3 | [math]\mu[/math]^{3} + 3μσ^{2} | 0 |
4 | [math]\mu[/math]^{4} + 6[math]\mu[/math]^{2}[math]\sigma[/math]^{2} + 3[math]\sigma[/math]^{4} | 3[math]\sigma[/math]^{ 4} |
Moment generating functions
The moment generating function of a real random variable [math]X[/math] is the expected value of [math]e^{tX}[/math], as a function of the real parameter [math]t[/math]. For a normal distribution with mean [math]\mu[/math] and deviation [math]\sigma[/math], the moment generating function exists and is equal to
Cumulative distribution function
The cumulative distribution function (CDF) of the standard normal distribution, usually denoted with the capital Greek letter [math]\Phi[/math] (phi), is the integral
In statistics one often uses the related error function, or erf([math]x[/math]), defined as the probability of a random variable with normal distribution of mean 0 and variance 1/2 falling in the range [math][-x, x][/math]; that is
These integrals cannot be expressed in terms of elementary functions, and are often said to be special functions. However, many numerical approximations are known; see below.
The two functions are closely related, namely
For a generic normal distribution [math]f[/math] with mean [math]\mu[/math] and deviation [math]\sigma[/math], the cumulative distribution function is
The complement of the standard normal CDF, [math]Q(x) = 1 - \Phi(x)[/math], is often called the Q-function, especially in engineering texts.^{[9]}^{[10]} It gives the probability that the value of a standard normal random variable will exceed [math]x[/math].
Standard deviation and tolerance intervals
About 68% of values drawn from a normal distribution are within one standard deviation [math]\sigma[/math] away from the mean; about 95% of the values lie within two standard deviations; and about 99.7% are within three standard deviations. This fact is known as the 68-95-99.7 (empirical) rule, or the 3-sigma rule.
More precisely, the probability that a normal deviate lies in [[math]\mu - n\sigma [/math], [math]\mu + n\sigma [/math]] is given by
Quantile function
The quantile function of a distribution is the inverse of the cumulative distribution function. The quantile function of the standard normal distribution is called the probit function, and can be expressed in terms of the inverse error function:
For a normal random variable with mean [math]\mu[/math] and variance [math]\sigma^2[/math], the quantile function is
The quantile [math] \Phi^{-1}(p) [/math] of the standard normal distribution is commonly denoted as [math]z_p[/math]. A normal random variable [math]X[/math] will exceed [math]\mu + \sigma z_p[/math] with probability [math]1-p[/math]; and will lie outside the interval [math]\mu ± \sigma z_p [/math] with probability [math]2(1-p)[/math]. In particular, a normal random variable will lie outside the interval [math]\mu ± 1.96\sigma [/math] in only 5% of cases.
Examples
A professor's exam scores are approximately distributed normally with mean 80 and standard deviation 5. Using the cumulative standard normal table, we can answer the questions below.
What is the probability that a student scores an 82 or less?
What is the probability that a student scores a 90 or more?
What is the probability that a student scores a 74 or less?
What is the probability that a student scores between 74 and 82?
What is the probability that an average of three scores is 82 or less?
Gamma Distribution
The gamma distribution is a two-parameter family of continuous probability distributions. The common exponential distribution and chi-squared distribution are special cases of the gamma distribution.
The parametrization with [math]\alpha[/math] and [math]\theta[/math] appears to be more common in econometrics and certain other applied fields, where e.g. the gamma distribution is frequently used to model waiting times. For instance, in life testing, the waiting time until death is a random variable that is frequently modeled with a gamma distribution.^{[11]}
If [math]\alpha[/math] is a positive integer, then the distribution represents an Erlang distribution; i.e., the sum of α independent exponentially distributed random variables, each of which has a mean of [math]\theta[/math].
Characterization using shape α and scale θ
A random variable [math]X[/math] that is gamma-distributed with shape [math]\alpha[/math] and scale [math]\theta[/math] is denoted by
The probability density function using the shape-scale parametrization is
Here [math]\Gamma(\alpha)[/math] is the gamma function evaluated at [math]α[/math].
The cumulative distribution function is the regularized gamma function:
where [math]\gamma\left(\alpha, \frac{x}{\theta}\right)[/math] is the lower incomplete gamma function.
Properties
Summation
If [math]X_i[/math] has a [math]\textrm{Gamma}(\alpha_i, \theta)[/math] distribution for [math]i =1,\ldots, N[/math] (i.e., all distributions have the same scale parameter [math]\theta_i[/math]), then
provided all [math]X_i[/math] are independent. This shows that the gamma distribution exhibits infinite divisibility. For the cases where the [math]X_i[/math] are independent but have different scale parameters see Mathai (1982) and Moschopoulos (1984).
Related distributions
Distribution | Relation |
---|---|
Exponential | If [math]X \sim \textrm{Gamma}(1,1/\lambda) [/math] then [math]X[/math] has an exponential distribution with rate parameter [math]\lambda [/math]. |
Chi-square | If [math]X \sim \textrm{Gamma}(\nu/2,2) [/math] then [math]X[/math] is identical to [math]\chi^2(\nu)[/math], the chi-squared distribution with [math]\nu[/math] degrees of freedom. Conversely, if [math] Q \sim \chi^2(\nu)[/math] and [math]c[/math] is a positive constant, then [math]cQ \sim \textrm{Gamma}(\nu/2,2c) [/math]. |
Waiting time of Poisson process | If [math]\alpha [/math] is an integer, the gamma distribution is an Erlang distribution and is the probability distribution of the waiting time until the [math]k[/math]^{th} in a one-dimensional Poisson process with intensity [math]\theta^{-1} [/math]. |
Inverse Gamma | If [math]X \sim \textrm{Gamma}(\alpha,\theta) [/math], then [math]1/X \sim \textrm{Inv-Gamma}(\alpha, \theta^{-1})[/math] (see Inverse-gamma distribution for derivation). |
Beta | If [math]X \sim \textrm{Gamma}(\alpha,\theta) [/math] and [math]Y \sim \textrm{Gamma}(\beta,\theta) [/math] are independently distributed, then [math]X/(X+Y)[/math] has a Beta distribution with parameters [math]\alpha[/math] and [math]\beta[/math]. |
Gaussian/Normal | For large [math]\alpha[/math] the gamma distribution converges to a Gaussian distribution with mean [math]\mu = \alpha \theta[/math] and variance [math]\sigma^2 =\alpha \theta ^2[/math]. |
Uniform
The continuous uniform distribution or rectangular distribution is a family of symmetric probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by the two parameters, [math]a[/math] and [math]b[/math], which are its minimum and maximum values. The distribution is often abbreviated [math]U(a,b)[/math].
Characterization
Probability density function
The probability density function of the continuous uniform distribution is:
The values of [math]f(x)[/math] at the two boundaries [math]a[/math] and [math]b[/math] are usually unimportant because they do not alter the values of the integrals of [math]f(x) [/math] over any interval, nor of [math]x f(x) [/math] or any higher moment.
Cumulative distribution function
The cumulative distribution function is:
Its inverse is:
Moment-generating function
The moment generating function is:^{[12]}
from which we may calculate the raw moments [math]m_k[/math] with
Properties
Mean and Variance
The mean (first moment) of the distribution is:
The variance (second central moment) is:
Order statistics
Let [math]X_1,\ldots,X_n[/math] be an i.i.d sample from [math]U(0,1)[/math]. Let [math]X_{(k)}[/math] be the [math]k[/math]^{th} order statistic from this sample. Then the probability distribution of [math]X_{(k)}[/math] is a Beta distribution with parameters [math]k[/math] and [math]n-k+1[/math]. The expected value is
The variances are
Uniformity
The probability that a uniformly distributed random variable falls within any interval of fixed length is independent of the location of the interval itself (but it is dependent on the interval size), so long as the interval is contained in the distribution's support.
To see this, if [math]X \sim U(a,b) [/math] and [math][x,x+d][/math] is a subinterval of [math][a,b][/math] with fixed [math]d \gt 0 [/math], then
which is independent of [math]x[/math]. This fact motivates the distribution's name.
Standard uniform
Restricting [math]a=0[/math] and [math]b=1[/math], the resulting distribution [math]U(0,1)[/math] is called a standard uniform distribution.
One interesting property of the standard uniform distribution is that if [math]U[/math] has a standard uniform distribution, then so does 1-[math]U[/math]. This property can be used for generating antithetic variates, among other things.
Related distributions
Distribution | Relation |
---|---|
Exponential | If [math]X[/math] has a standard uniform distribution, then by the inverse transform sampling method, [math]Y = -\lambda^{-1}\ln(X)[/math] has an exponential distribution with (rate) parameter [math]\lambda [/math]. |
Beta | If [math]X[/math] has a standard uniform distribution, then [math]Y = X^n [/math] has a beta distribution with parameters [math]n^{-1}[/math] and 1. (Note this implies that the standard uniform distribution is a special case of the beta distribution, with parameters 1 and 1.) |
Irwin-Hall | The Irwin–Hall distribution is the sum of [math]n[/math] i.i.d [math]U(0,1)[/math] distributions. |
Symmetric triangle | The sum of two independent, equally distributed, uniform distributions yields a symmetric triangular distribution. |
Triangle | The distance between two i.i.d. uniform random variables also has a triangular distribution, although not symmetric. |
Beta | The uniform distribution can be thought of as a beta distribution with parameters (1,1). |
The exponential distribution is the probability distribution that describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson processes, it is found in various other contexts.
The exponential distribution is not the same as the class of exponential families of distributions, which is a large class of probability distributions that includes the exponential distribution as one of its members, but also includes the normal distribution, binomial distribution, gamma distribution, Poisson, and many others.
Characterization
Probability density function
The probability density function (pdf) of an exponential distribution is
Here [math]\lambda \gt 0 [/math] is the parameter of the distribution, often called the rate parameter. The distribution is supported on the interval [0, ∞). If a random variable [math]X[/math] has this distribution, we write [math]X \sim \textrm{Exp}(\lambda) [/math].
The exponential distribution exhibits infinite divisibility.
Cumulative distribution function
The cumulative distribution function is given by
Alternative parameterization
A commonly used alternative parametrization is to define the probability density function (pdf) of an exponential distribution as
where [math]\theta \gt 0 [/math] is the mean, standard deviation, and scale parameter of the distribution, the reciprocal of the rate parameter, [math]\lambda [/math], defined above. In this specification, [math]\theta [/math] is a survival parameter in the sense that if a random variable [math]X[/math] is the duration of time that a given biological or mechanical system manages to survive and [math]X \sim \textrm{Exp}(\theta) [/math] then [math]\operatorname{E}[X] = \theta [/math]. That is to say, the expected duration of survival of the system is [math]\theta[/math] units of time. The parametrization involving the "rate" parameter arises in the context of events arriving at a rate [math]\lambda [/math], when the time between events (which might be modeled using an exponential distribution) has a mean of [math]\theta = \lambda^{-1} [/math].
The alternative specification is sometimes more convenient than the one given above, and some authors will use it as a standard definition. Unfortunately this gives rise to a notational ambiguity. In general, the reader must check which of these two specifications is being used if an author writes [math]X \sim \textrm{Exp}(\lambda) [/math], since either the notation in the previous (using [math]\lambda [/math]) or the notation in this section (here, using [math]\theta [/math] to avoid confusion) could be intended.
Properties
Mean, variance, moments and median
The mean or expected value of an exponentially distributed random variable [math]X[/math] with rate parameter λ is given by
, see above. In light of the examples given above, this makes sense: if you receive phone calls at an average rate of 2 per hour, then you can expect to wait half an hour for every call.
The variance of [math]X[/math] is given by
,so the standard deviation is equal to the mean.
The moments of [math]X[/math], for [math]n = 1, 2, \ldots [/math] are given by
The median of [math]X[/math] is given by
, where ln refers to the natural logarithm. Thus the absolute difference between the mean and median is
, in accordance with the median-mean inequality.
Memorylessness
An exponentially distributed random variable [math]T[/math] obeys the relation
When [math]T[/math] is interpreted as the waiting time for an event to occur relative to some initial time, this relation implies that, if [math]T[/math] is conditioned on a failure to observe the event over some initial period of time [math]s[/math], the distribution of the remaining waiting time is the same as the original unconditional distribution. For example, if an event has not occurred after 30 seconds, the conditional probability that occurrence will take at least 10 more seconds is equal to the unconditional probability of observing the event more than 10 seconds relative to the initial time.
The exponential distribution and the geometric distribution are the only memoryless probability distributions;consequently, the exponential distribution is the only continuous probability distribution that has a constant Failure rate.
Notes
- Normal Distribution, Gale Encyclopedia of Psychology
- Casella & Berger (2001, p. 102)
- Lyon, A. (2014). Why are Normal Distributions Normal?, The British Journal for the Philosophy of Science.
- For the proof see Gaussian Integral
- Halperin, Hartley & Hoel (1965, item 7)
- McPherson (1990, p. 110)
- ^{7.0} ^{7.1} ^{7.2} Patel & Read (1996, [2.1.4])
- Fan (1991, p. 1258)
- Scott, Clayton; Nowak, Robert (August 7, 2003). "The Q-function". Connexions.
- Barak, Ohad (April 6, 2006). "Q Function and Error Function" (PDF). Tel Aviv University.
- See Hogg and Craig (1978, Remark 3.3.1) for an explicit motivation
- Casella & Berger 2001, p. 626
References
- R. V. Hogg and A. T. Craig (1978) Introduction to Mathematical Statistics, 4th edition. New York: Macmillan. (See Section 3.3.)'
- P. G. Moschopoulos (1985) The distribution of the sum of independent gamma random variables, Annals of the Institute of Statistical Mathematics, 37, 541–544
- A. M. Mathai (1982) Storage capacity of a dam with gamma type inputs, Annals of the Institute of Statistical Mathematics, 34, 591–597
- Wikipedia contributors. "Uniform distribution". Wikipedia. Wikipedia. Retrieved 28 January 2022.
- Wikipedia contributors. "Gamma distribution". Wikipedia. Wikipedia. Retrieved 28 January 2022.
- Wikipedia contributors. "Exponential distribution". Wikipedia. Wikipedia. Retrieved 28 January 2022.
- Casella, George; Berger, Roger L. (2001). Statistical Inference (2nd ed.). Duxbury. ISBN 0-534-24312-6.CS1 maint: ref=harv (link)
- "Recommended Standards for Statistical Symbols and Notation. COPSS Committee on Symbols and Notation" (1965). The American Statistician 19 (3): 12–14. doi: .
- McPherson, Glen (1990). Statistics in Scientific Investigation: Its Basis, Application and Interpretation. Springer-Verlag. ISBN 0-387-97137-8.CS1 maint: ref=harv (link)
- Cover, Thomas M.; Thomas, Joy A. (2006). Elements of Information Theory. John Wiley and Sons.CS1 maint: ref=harv (link)
- Fan, Jianqing (1991). "On the optimal rates of convergence for nonparametric deconvolution problems". The Annals of Statistics 19 (3): 1257–1272. doi: .
- Wikipedia contributors. "Normal distribution". Wikipedia. Wikipedia. Retrieved 28 January 2022.