# Discrete Distributions

## Binomial

The binomial distribution with parameters $n$ and $p$ is the discrete probability distribution of the number of successes in a sequence of $n$ independent yes/no experiments, each of which yields success with probability $p$. A success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when $n = 1$, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

The binomial distribution is frequently used to model the number of successes in a sample of size $n$ drawn with replacement from a population of size $N$. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one. However, for $N$ much larger than $n$, the binomial distribution is a good approximation, and widely used.

### Specification

#### Probability mass function

In general, if the random variable $X$ follows the binomial distribution with parameters $n$ ∈ ℕ and $p$ ∈ [0,1], we write $X \sim B(n,p)$. The probability of getting exactly $k$ successes in $n$ trials is given by the probability mass function:

[$] f(k;n,p) = \operatorname{P}(X = k) = \binom n k p^k(1-p)^{n-k}[$]

for $k = 0, \ldots, n$ where

[$]\binom n k =\frac{n!}{k!(n-k)!}[$]

is the binomial coefficient, hence the name of the distribution. The formula can be understood as follows: we want exactly $k$ successes ($p^k$) and $n-k$ failures ($(1-p)^{-(n-k)}$). However, the $k$ successes can occur anywhere among the $n$ trials, and there are ${n\choose k}$ different ways of distributing $k$ successes in a sequence of $n$ trials.

In creating reference tables for binomial distribution probability, usually the table is filled in up to $n/2$ values. This is because for $k \gt n/2$, the probability can be calculated by its complement as

[$]f(k,n,p)=f(n-k,n,1-p). [$]

The probability mass function satisfies the following recurrence relation, for every $n,p$:

[$]\left\{\begin{array}{l} p (n-k) f(k,n,p) = (k+1) (1-p) f(k+1,n,p), \\[10pt] f(0,n,p)=(1-p)^n \end{array}\right\}[$]

Looking at the expression $f(k,n,p)$ as a function of $k$, there is a $k$ value that maximizes it. This $k$ value can be found by calculating

[$] \frac{f(k+1,n,p)}{f(k,n,p)}=\frac{(n-k)p}{(k+1)(1-p)} [$]

and comparing it to 1. There is always an integer M that satisfies

[$](n+1)p-1 \leq M \lt (n+1)p.[$]

$f(k,n,p)$ is monotone increasing for $k \lt M$ and monotone decreasing for $k \gt M$, with the exception of the case where $(n+1)p$ is an integer. In this case, there are two values for which $f$ is maximal: $(n+1)p$ and $(n+1)p-1$. $M$ is the most probable (most likely) outcome of the Bernoulli trials and is called the mode. Note that the probability of it occurring can be fairly small.

#### Cumulative distribution function

The cumulative distribution function can be expressed as:

[$]F(k;n,p) = \Pr(X \le k) = \sum_{i=0}^{\lfloor k \rfloor} {n\choose i}p^i(1-p)^{n-i}[$]

where $\lfloor k\rfloor\,$ is the "floor" under $k$, i.e. the greatest integer less than or equal to $k$.

### Mean and Variance

If $X \sim B(n,p)$, that is, $X$ is a binomially distributed random variable, $n$ being the total number of experiments and $p$ the probability of each experiment yielding a successful result, then the expected value of $X$ is $np$ and the variance is $npq$. This follows directly from the fact that $X$ is equal in distribution to the sum of $n$ independent Bernouilli random variables each having success probability $p$ (see below).

### Mode

Usually the mode of a binomial $B(n,p)$ distribution is equal to $\lfloor (n+1)p\rfloor$, where $\lfloor\cdot\rfloor$ is the floor function. However, when$(n+1)p$ is an integer and $p$ is neither 0 nor 1, then the distribution has two modes: $(n+1)p$(n + 1)p and $(n+1)p -1$. When $p$ is equal to 0 or 1, the mode will be 0 and $n$ correspondingly. These cases can be summarized as follows:

[$] \begin{cases} \lfloor (n+1)\,p\rfloor & \text{if }(n+1)p\text{ is 0 or a noninteger}, \\ (n+1)\,p\ \text{ and }\ (n+1)\,p - 1 &\text{if }(n+1)p\in\{1,\dots,n\}, \\ n & \text{if }(n+1)p = n + 1. \end{cases}[$]

{{#Proof:View Proof|Mode|binomial/mode}}

### Median

In general, there is no single formula to find the median for a binomial distribution, and it may even be non-unique. However several special results have been established:

• If $np$ is an integer, then the mean, median, and mode coincide and equal $np$.
• Any median $m$ must lie within the interval ⌊$np$⌋ ≤ $m$ ≤ ⌈$np$⌉.
• A median $m$ cannot lie too far away from the mean: $m-np \leq \textrm{min}\{\ln(2),\textrm{max}\{p,1-p\}\}$.
• The median is unique and equal to $m=$round($np$) in cases when either $p\leq 1-\ln(2)$ or $p\geq \ln(2)$ or $|m-np| \leq \textrm{min}\{p, 1-p\}$ (except for the case when $p = 1/2$ and $n$ is odd).
• When $p=1/2$ and $n$ is odd, any number $m$ in the interval $[(n-1)/2,(n+1)/2]$ is a median of the binomial distribution. If $p = 1/2$ and $n$ is even, then $m = n/2$ is the unique median.

### Related distributions

#### Sums of binomials

If $X \sim B(n,p)$ and $Y \sim B(m, p)$ are independent binomial variables with the same probability $p$, then $X+Y$ is again a binomial variable: its distribution is $Z=X+Y \sim B(n+m, p)$.

#### Bernoulli distribution

The Bernoulli distribution is a special case of the binomial distribution, where $n = 1$. Symbolically, $X \sim B(1,p)$ has the same meaning as $X \sim B(p)$. Conversely, any binomial distribution, $B(n,p)$, is the distribution of the sum of $n$ Bernoulli trials, $B(p)$, each with the same probability $p$.

#### Normal approximation

If $n$ is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation to $B(n,p)$ is given by the normal distribution $\mathcal{N}(np,\,np(1-p))$, and this basic approximation can be improved in a simple way by using a suitable continuity correction. The basic approximation generally improves as $n$ increases (at least 20) and is better when $p$ is not near to 0 or 1. Various heuristics may be used to decide whether $n$ is large enough, and $p$ is far enough from the extremes of zero or one:

• One rule is that both $x = np$ and $n-p$ must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10 which gives virtually the same results as the following rule for large $n$ until $n$ is very large.
• A second rule is that for $n\gt5$ the normal approximation is adequate if

[$]\left | \left (\frac{1}{\sqrt{n}} \right ) \left (\sqrt{\frac{1-p}{p}}-\sqrt{\frac{p}{1-p}} \right ) \right |\lt0.3[$]

• Another commonly used rule holds that the normal approximation is appropriate only if everything within 3 standard deviations of its mean is within the range of possible values, that is if

[$]\mu \pm 3 \sigma = np \pm 3 \sqrt{np(1-p)} \in [0,n].[$]

The following is an example of applying a continuity correction. Suppose one wishes to calculate $\operatorname{P}(X \leq 8)$ for a binomial random variable $X$. If $Y$ has a distribution given by the normal approximation, then $\operatorname{P}(X \leq 8 )$ is approximated by $\operatorname{P}(Y \leq 8.5 )$. The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results.

This approximation, known as de Moivre–Laplace theorem, is a huge time-saver when undertaking calculations by hand (exact calculations with large $n$ are very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine of Chances in 1738. Nowadays, it can be seen as a consequence of the central limit theorem since $B(n,p)$ is a sum of $n$ independent, identically distributed Bernoulli variables with parameter $p$. This fact is the basis of a hypothesis test, a "proportion z-test", for the value of $p$ using $x/n$, the sample proportion and estimator of $p$, in a common test statistic.

For example, suppose one randomly samples $n$ people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If groups of $n$ people were sampled repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion $p$ of agreement in the population and with standard deviation $\sigma = \sqrt{\frac{p(1-p)}{n}}$

## Geometric

The geometric distribution is the probability distribution of the number of failures before the first success supported on the set { 0, 1, 2, 3, ... }, i.e, if $p$ denotes the probability of success on each trial then

[$]\operatorname{P}(Y=k) = (1 - p)^k\,p\,.[$]

To retain consistency with the notation found in , we set $\beta = p/(1-p)$ and obtain:

[$]\begin{equation}\label{geometric}\operatorname{P}(Y=k) = \frac{\beta^k}{(1+\beta)^{k+1}}.\end{equation}[$]

Going forward, we assume that a geometric distribution is characterized by \ref{geometric} and depends solely on $\beta$ which turns out to be the mean of the distribution.

### Moments

The expected value of the geometrically distributed random variable $Y$ is $\beta$ and its variance is $\beta(\beta +1)$:

[$] \operatorname{E}(Y) = \beta, \qquad\operatorname{Var}(Y) = \beta(1 + \beta). [$]

### Related distributions

• The geometric distribution $Y$ is a special case of the negative binomial distribution, with $r = 1$. More generally, if $Y_1,\ldots,Y_r$ are independent geometrically distributed variables with parameter $p$, then the sum

[$]Z = \sum_{m=1}^r Y_m[$]

follows a negative binomial distribution with parameters $r$ and $p$.

• If $Y_1,\ldots,Y_r$ are independent geometrically distributed variables (with possibly different success parameters $p_m$), then their minimum

[$]W = \min_{m \in 1, \dots, r} Y_m\,[$]

is also geometrically distributed, with parameter $p = 1-\prod_m(1-p_{m}).$

• Suppose $0 \lt r \lt 1$, and for $k = 1,2,3,\ldots$ the random variable $X_k$ has a Poisson distribution with expected value $r^k$. Then

[$]\sum_{k=1}^\infty k\,X_k[$]

has a geometric distribution taking values in the set {0, 1, 2, ...}, with expected value $r/(1-r)$.

• The exponential distribution is the continuous analogue of the geometric distribution. If $X$ is an exponentially distributed random variable with parameter $\lambda$, then

[$]Y = \lfloor X \rfloor,[$]

where $\lfloor \quad \rfloor$ is the floor (or greatest integer) function, is a geometrically distributed random variable with parameter $p = 1- e^{-\lambda}$ (thus $\lambda = - \ln(1-p)$) and taking values in the set {0, 1, 2, ...}.

## Poisson Distribution

The Poisson distribution , named after French mathematician Siméon Denis Poisson, is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume. Within the context of insurance, the Poisson distribution can be used to model the number (frequency) of claims during a given time period.

### Definition

A discrete random variable $X$ is said to have a Poisson distribution with parameter $\lambda \gt 0$, if, for $k = 0, 1, \ldots$, the probability mass function of $X$ is given by

[$]\!p_k= \operatorname{P}(X = k)= \frac{\lambda^k e^{-\lambda}}{k!}.[$]

The probability mass function satisfies the following recurrence relation:

[$]\left\{\begin{array}{l} (k+1) p_{k+1}-\lambda p_{k}=0, \\ p_{0}=e^{-\lambda} \end{array}\right\}. [$]

### Properties

#### Mean

[$]\operatorname{E}|X-\lambda|= 2\exp(-\lambda) \frac{\lambda^{\lfloor\lambda\rfloor + 1}}{ \lfloor\lambda\rfloor!} .[$]

• The mode of a Poisson-distributed random variable with non-integer $\lambda$ is equal to $\scriptstyle\lfloor \lambda \rfloor$, which is the largest integer less than or equal to $\lambda$. This is also written as floor($λ$). When $λ$ is a positive integer, the modes are $\lambda$ and $\lambda-1$.

#### Median

Bounds for the median ($ν$) of the distribution are known and are sharp:

[$] \lambda - \ln 2 \le \nu \lt \lambda + \frac{1}{3}. [$]

#### Higher moments

[$] m_k = \sum_{i=1}^k \lambda^i \left\{\begin{matrix} k \\ i \end{matrix}\right\},[$]

where the {braces} denote Stirling numbers of the second kind. The coefficients of the polynomials have a combinatorial meaning. In fact, when the expected value of the Poisson distribution is 1, then Dobinski's formula says that the $n$th moment equals the number of partitions of a set of size $n$.

• If $X_i \sim \operatorname{Pois}(\lambda_i)$ are independent and $\lambda=\sum_{i=1}^n \lambda_i$, then $Y = \left( \sum_{i=1}^n X_i \right) \sim \operatorname{Pois}(\lambda)$. A converse is Raikov's theorem, which says that if the sum of two independent random variables is Poisson-distributed, then so is each of those two independent random variables.

#### Other properties

• Bounds for the tail probabilities of a Poisson random variable $X \sim \operatorname{Pois}(\lambda)$ can be derived using a Chernoff bound argument:

[$] \begin{cases} \operatorname{P}(X \geq x) \leq \frac{e^{-\lambda} (e \lambda)^x}{x^x} & \text{ for } x \gt \lambda \\ \operatorname{P}(X \leq x) \leq \frac{e^{-\lambda} (e \lambda)^x}{x^x} & \text{ for } x \lt \lambda \,\, . \end{cases} [$]

## Negative Binomial

The negative binomial distribution is a discrete probability distribution of the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified number of successes (denoted $r$) occurs. More precisely, suppose there is a sequence of independent Bernoulli trials. Thus, each trial has two potential outcomes called “success” and “failure”. In each trial the probability of success is $p$ and of failure is $1-p$. We are observing this sequence until a predefined number $r$ of successes has occurred.

### Probability Mass Function

The probability mass function of the negative binomial distribution is

[$] f(k; r, q) \equiv \operatorname{P}(N = k) = \binom{k+r-1}{k} (1-p)^kp^r \quad\text{for }k = 0, 1, 2, \dotsc [$]

The binomial coefficient can be written in the following manner, explaining the name “negative binomial”:

[] \begin{align*} \frac{(k+r-1)\dotsm(r)}{k!} &= (-1)^k \frac{(-r)(-r-1)(-r-2)\dotsm(-r-k+1)}{k!} \\ \label{*} &= (-1)^k\binom{-r}{k}. \end{align*} []

To understand the above definition of the probability mass function, note that the probability for every specific sequence of $k$ failures and $r$ successes is $p^r(1-p)^k$, because the outcomes of the $k$ trials are supposed to happen independently. Since the $r$th success comes last, it remains to choose the $k$ trials with failures out of the remaining $r-1$ trials. The above binomial coefficient gives precisely the number of all these sequences of length $k-1$.

### Extension to real-valued r

It is possible to extend the definition of the negative binomial distribution to the case of a positive real parameter $r$. Although it is impossible to visualize a non-integer number of “successes”, we can still formally define the distribution through its probability mass function.

In the spirit of being consistent with the parametrizations found in , we consider the alternative parametrization defined implicitly by setting $p = 1(1+\beta)$.

As before, we say that $N$ has a negative binomial (or Pólya) distribution if it has a probability mass function:

[$] f(k; r, \beta) \equiv \operatorname{P}(N = k) = \binom{k+r-1}{k} \frac{\beta^k}{(1 + \beta)^{r + k}} \quad\text{for }k = 0, 1, 2, \dotsc [$]

Here $r$ is a real, positive number. The binomial coefficient is then defined by the multiplicative formula and can also be rewritten using the gamma function:

[$] \binom{k+r-1}{k} = \frac{(k+r-1)(k+r-2)\dotsm(r)}{k!} = \frac{\Gamma(k+r)}{k!\,\Gamma(r)}. [$]

To show that the probability mass function adds up to one, we have, by the binomial series

[$] (1 + \beta)^{-r} = (1 - (1-p))^{-r} =\sum_{k=0}^\infty(-1)^k\binom{-r}{k}(1-p)^k = (1 + \beta)^r \,\sum_{k=0}^\infty \operatorname{P}(N = k). [$]

Finally, the following recurrence relation holds:

[$]\begin{array}{l} (k+1) \operatorname{P} (k+1)- (1-p) \operatorname{P} (k) (k+r)=0, \\ \operatorname{P} (0) = p^r. \end{array} [$]