Combinatorial Probability

Basic combinatorial probability focuses on computing probabilities of events associated with finite or countable discrete structures. This page will focus primarily on urn problems which are elementary combinatorial probability problems involving basic enumerative techniques. These urn problems and their solutions serve as a template for attacking other problems involving discrete random variables and their associated events.

Permutations

The notion of permutation relates to the act of arranging all the members of a set into some sequence or order, or if the set is already ordered, rearranging (reordering) its elements, a process called permuting. These differ from combinations, which are selections of some members of a set where order is disregarded. For example, written as tuples, there are six permutations of the set {1,2,3}, namely: (1,2,3), (1,3,2), (2,1,3), (2,3,1), (3,1,2), and (3,2,1). These are all the possible orderings of this three element set. As another example, an anagram of a word, all of whose letters are different, is a permutation of its letters. In this example, the letters are already ordered in the original word and the anagram is a reordering of the letters.

The number of permutations of [math]n[/math] distinct objects is [math]n![/math], which means the product of all positive integers less than or equal to [math]n[/math].

Permutations with repetition

Ordered arrangements of the elements of a set [math]S[/math] of length [math]n[/math] where repetition is allowed are called [math]n[/math]-tuples, but have sometimes been referred to as permutations with repetition although they are not permutations in general. They are also called words over the alphabet [math]S[/math] in some contexts. If the set [math]S[/math] has [math]k[/math] elements, the number of [math]n[/math]-tuples over [math]S[/math] is [math]k^n[/math]. There is no restriction on how often an element can appear in an [math]n[/math]-tuple, but if restrictions are placed on how often an element can appear, this formula is no longer valid.

Combinations

A combination is a way of selecting items from a collection, such that (unlike permutations) the order of selection does not matter. In smaller cases it is possible to count the number of combinations. For example, given three fruits, say an apple, an orange and a pear, there are three combinations of two that can be drawn from this set: an apple and a pear; an apple and an orange; or a pear and an orange. More formally, a [math]k[/math]-combination of a set [math]S[/math] is a subset of [math]k[/math] distinct elements of [math]S[/math]. If the set has [math]n[/math] elements, the number of [math]k[/math]-combinations is equal to the binomial coefficient

[[math]] \binom nk = \frac{n(n-1)\dotsb(n-k+1)}{k(k-1)\dotsb1}[[/math]]

, which can be written using factorials as [math]\textstyle\frac{n!}{k!(n-k)!}[/math] whenever [math]k\leq n[/math], and which is zero when [math]k \gt n[/math]. The set of all [math]k[/math]-combinations of a set [math]S[/math] is sometimes denoted by [math]\textstyle\binom Sk\,[/math].

Combinations refer to the combination of [math]n[/math] things taken [math]k[/math] at a time without repetition. To refer to combinations in which repetition is allowed, the terms [math]k[/math]-selection,^[1] [math]k[/math]-multiset,^[2] or [math]k[/math]-combination with repetition are often used.^[3] If, in the above example, it was possible to have two of any one kind of fruit there would be 3 more 2-selections: one with two apples, one with two oranges, and one with two pears.

Although the set of three fruits was small enough to write a complete list of combinations, with large sets this becomes impractical. For example, a poker hand can be described as a 5-combination ([math]k[/math] = 5) of cards from a 52 card deck ([math]n[/math] = 52). The 5 cards of the hand are all distinct, and the order of cards in the hand does not matter. There are 2,598,960 such combinations, and the chance of drawing any one hand at random is 1 / 2,598,960.

Number of [math]k[/math]-combinations

The number of [math]k[/math]-combinations from a given set [math]S[/math] of [math]n[/math] elements is often denoted in elementary combinatorics texts by [math]C(n,k)[/math], or by a variation such as [math]C^n_k[/math], [math]{}_nC_k[/math], [math]{}^nC_k[/math], [math]C_{n,k}[/math] or even [math]C_n^k[/math] (the latter form was standard in French, Romanian, Russian, Chinese^[4] and Polish texts). The same number however occurs in many other mathematical contexts, where it is denoted by [math]\tbinom nk[/math] (often read as "[math]n[/math] choose [math]k[/math]"); notably it occurs as a coefficient in the binomial formula, hence its name binomial coefficient. One can define [math]\tbinom nk[/math] for all natural numbers [math]k[/math] at once by the relation

[[math]](1+X)^n=\sum_{k\geq0}\binom nk X^k[[/math]]

, from which it is clear that [math]\tbinom n0=\tbinom nn=1[/math] and [math]\tbinom nk=0[/math] for [math]k \gt n[/math]. To see that these coefficients count [math]k[/math]-combinations from [math]S[/math], one can first consider a collection of [math]n[/math] distinct variables [math]X[/math]_{[math]S[/math]} labeled by the elements [math]S[/math] of [math]S[/math], and expand the product over all elements of [math]S[/math]:

[[math]]\prod_{s\in S}(1+X_s)[[/math]]

It has [math]2^n[/math] distinct terms corresponding to all the subsets of [math]S[/math], each subset giving the product of the corresponding variables [math]X_S[/math]. Now setting all of the [math]X_S[/math] equal to the unlabeled variable [math]X[/math], so that the product becomes [math] (1 + X)^n [/math], the term for each [math]k[/math]-combination from [math]S[/math] becomes [math]X^k[/math], so that the coefficient of that power in the result equals the number of such [math]k[/math]-combinations.

Binomial coefficients can be computed explicitly in various ways. To get all of them for the expansions up to [math](1+X)^n[/math], one can use (in addition to the basic cases already given) the recursion relation

[[math]]\binom nk=\binom{n-1}{k-1}+\binom{n-1}k[[/math]]

, for 0 < [math]k \lt n[/math], which follows from [math](1+X)^n = (1+X)^{n-1}(1+X) [/math]; this leads to the construction of Pascal's triangle.

For determining an individual binomial coefficient, it is more practical to use the formula

[[math]]\binom nk = \frac{n(n-1)(n-2)\cdots(n-k+1)}{k!}[[/math]]

The numerator gives the number of [math]k[/math]-permutations of [math]n[/math], i.e., of sequences of [math]k[/math] distinct elements of [math]S[/math], while the denominator gives the number of such [math]k[/math]-permutations that give the same [math]k[/math]-combination when the order is ignored.

When [math]k[/math] exceeds [math]n/2[/math], the above formula contains factors common to the numerator and the denominator, and canceling them out gives the relation

[[math]] \binom nk = \binom n{n-k}[[/math]]

, for [math] 0 \leq k \leq n[/math]. This expresses a symmetry that is evident from the binomial formula, and can also be understood in terms of [math]k[/math]-combinations by taking the complement of such a combination, which is an [math](n-k)[/math]-combination.

Finally there is a formula which exhibits this symmetry directly, and has the merit of being easy to remember:

[[math]] \binom nk = \frac{n!}{k!(n-k)!}[[/math]]

, where [math]n![/math] denotes the factorial of [math]n[/math]. It is obtained from the previous formula by multiplying denominator and numerator by [math](n-k)![/math], so it is certainly inferior as a method of computation to that formula.

The last formula can be understood directly, by considering the [math]n![/math] permutations of all the elements of [math]S[/math]. Each such permutation gives a [math]k[/math]-combination by selecting its first [math]k[/math] elements. There are many duplicate selections: any combined permutation of the first [math]k[/math] elements among each other, and of the final [math](n-k)[/math] elements among each other produces the same combination; this explains the division in the formula.

From the above formulas follow relations between adjacent numbers in Pascal's triangle in all three directions:

[[math]] \binom nk = \begin{cases} \binom n{k-1} \frac {n-k+1}k &\quad \text{if } k \gt 0 \\ \binom {n-1}k \frac n{n-k} &\quad \text{if } k \lt n \\ \binom {n-1}{k-1} \frac nk &\quad \text{if } n, k \gt 0 \end{cases} [[/math]]

Together with the basic cases [math]\tbinom n0=1=\tbinom nn[/math], these allow successive computation of respectively all numbers of combinations from the same set (a row in Pascal's triangle), of [math]k[/math]-combinations of sets of growing sizes, and of combinations with a complement of fixed size [math]n-k[/math].

Urn Problems

An urn problem is an idealized mental exercise in which some objects of real interest (such as atoms, people, cars, etc.) are represented as colored balls in an urn or other container. One pretends to remove one or more balls from the urn; the goal is to determine the probability of drawing one color or another, or some other properties.

In this basic urn model in probability theory, the urn contains [math]x[/math] green and [math]y[/math] red balls, well-mixed together. One ball is drawn randomly from the urn and its color observed; it is then placed back in the urn (or not), and the selection process is repeated.

Possible questions that can be answered in this model are:

Can I infer the proportion of green and red balls from [math]n[/math] observations? With what degree of confidence?
Knowing [math]x[/math] and [math]y[/math], what is the probability of drawing a specific sequence (e.g. one green followed by one red)?
If I only observe [math]n[/math] balls, how sure can I be that there are no green balls? (A variation on the first question)

Example: number of successful draws (trials)

Assume the basic urn model (as described above). If [math]X[/math] denotes the number of green balls (successes) drawn from [math]n[/math] draws with replacement and [math]p[/math] denotes the probability of drawing a green ball from a single draw, then

[[math]] \begin{equation} \label{urn-example-binomial} \operatorname{P}(X = k) = \binom{n}{k} p^k (1-p)^{n-k}. \end{equation} [[/math]]

The probability distribution described by \ref{urn-example-binomial} is the binomial distribution.

Example: number of successful draws with multiple colors

Suppose we have an urn with colored balls and the number of distinct colors is [math]k[/math]. One can think of the colors as categorizing the balls into [math]k[/math] different categories and we can index the colors: color 1, color 2,...,color [math]k[/math]. What is the probability of drawing [math]x_i[/math] balls of color [math]i (i=1,\ldots,k)[/math] given [math]n[/math] total draws with replacement ? If [math]X_i[/math] is the random variable representing the number of balls drawn of color [math]i[/math] and [math]p_i[/math] denotes the probability of drawing a ball of color [math]i[/math] on a single draw, then

[[math]] \begin{equation} \label{urn-example-multinomial} \operatorname{P}[X_1= x_1,\ldots,X_n = x_n] = \frac{n!}{x_1!\cdots x_k!} p_1^{x_1} \cdots p_k^{x_k}. \end{equation} [[/math]]

The probability distribution described by \ref{urn-example-multinomial} is called a multinomial distribution.

Example: drawing without replacement

Define drawing a green ball as a success and drawing a red ball as a failure. If the variable [math]N[/math] describes the number of all marbles in the urn (see contingency table below) and [math]K[/math] describes the number of green marbles, then [math]N-K[/math] corresponds to the number of red marbles. In this example, [math]X[/math] is the random variable whose outcome is [math]k[/math], the number of green marbles actually drawn in the experiment. This situation is illustrated by the following contingency table:

	drawn	not drawn	total
green marbles	[math]k[/math]	[math]K-k[/math]	[math]K[/math]
red marbles	[math]n-k[/math]	[math]N - K - n + k[/math]	[math]N − K[/math]
total	[math]n[/math]	[math]N − n[/math]	[math]N[/math]

Now, assume (for example) that there are 5 green and 45 red marbles in the urn. Standing next to the urn, you close your eyes and draw 10 marbles without replacement. What is the probability that exactly 4 of the 10 are green?

This problem is summarized by the following contingency table:

	drawn	not drawn	total
green marbles	[math]k[/math] = 4	[math]K-k[/math] = 1	[math]K[/math] = 5
red marbles	[math]n − k[/math] = 6	[math]N - K - n + k[/math] = 39	[math]N − K[/math] = 45
total	[math]n[/math] = 10	[math]N − n[/math] = 40	[math]N[/math] = 50

Generally speaking, the probability of drawing exactly [math]k[/math] green marbles can be calculated by the formula:

[[math]] \begin{equation} \label{urn-example-hyper} \operatorname{P}(X=k) ={{{K \choose k} {{N-K} \choose {n-k}}}\over {N \choose n}}. \end{equation} [[/math]]

The probability distribution described by \ref{urn-example-hyper} is called a hypergeometric distribution.

Example: first successful draw

Suppose that we have an urn with green and red balls. If [math]X[/math] denotes the smallest number of draws with replacement until one sees a green ball (a success), then

[[math]] \begin{equation} \label{urn-example-geom} \operatorname{P}(X = k) = (1-p)^{k-1}p. \end{equation} [[/math]]

The probability distribution described by \ref{urn-example-geom} is a geometric distribution.

Notes

Ryser 1963, p. 7 also referred to as an unordered selection.
Mazur 2010, p. 10
When the term combination is used to refer to either situation (as in (Brualdi 2010)) care must be taken to clarify whether sets or multisets are being discussed.
High School Textbook for full-time student (Required) Mathematics Book II B (in Chinese) (2nd ed.). China: People's Education Press. June 2006. pp. 107–116. ISBN 978-7-107-19616-4.CS1 maint: unrecognized language (link)

References

Wikipedia contributors. "Combination". Wikipedia. Wikipedia. Retrieved 28 January 2022.
Wikipedia contributors. "Permutation". Wikipedia. Wikipedia. Retrieved 28 January 2022.
Wikipedia contributors. "Urn problem". Wikipedia. Wikipedia. Retrieved 28 January 2022.

[1] Ryser 1963, p. 7 also referred to as an unordered selection.

[2] Mazur 2010, p. 10

[3] When the term combination is used to refer to either situation (as in (Brualdi 2010)) care must be taken to clarify whether sets or multisets are being discussed.

[4] High School Textbook for full-time student (Required) Mathematics Book II B (in Chinese) (2nd ed.). China: People's Education Press. June 2006. pp. 107–116. ISBN 978-7-107-19616-4.CS1 maint: unrecognized language (link)

[1]

[2]

[3]

[4]