Conditional Probability

In probability theory, conditional probability is a measure of the probability of an event given that (by assumption, presumption, assertion or evidence) another event has occurred.[1] If the event of interest is [math]A[/math] and the event [math]B[/math] is known or assumed to have occurred, "the conditional probability of [math]A[/math] given [math]B[/math]", or "the probability of [math]A[/math] under the condition [math]B[/math]", is usually written as [math]\operatorname{P}[/math]([math]A[/math]|[math]B[/math]). For example, the probability that any given person has a cough on any given day may be only 5%. But if we know or assume that the person has a cold, then they are much more likely to be coughing. The conditional probability of coughing given that you have a cold might be a much higher 75%.

Definition

Given two events [math]A[/math] and [math]B[/math] from the sigma-field of a probability space with [math]\operatorname{P}(B)\gt0[/math], the conditional probability of [math]A[/math] given [math]B[/math] is defined as the quotient of the probability of the joint of events [math]A[/math] and [math]B[/math], and the probability of [math]B[/math]:

[[math]]\operatorname{P}(A|B) = \frac{\operatorname{P}(A \cap B)}{\operatorname{P}(B)}[[/math]]

This may be visualized as restricting the sample space to [math]B[/math]. The logic behind this equation is that if the outcomes are restricted to [math]B[/math], this set serves as the new sample space.

Note that this is a definition but not a theoretical result. We just denote the quantity [math]\operatorname{P}(A\cap B)/\operatorname{P}(B)[/math] as [math]\operatorname{P}(A|B)[/math] and call it the conditional probability of [math]A[/math] given [math]B[/math].

Example

Suppose that somebody secretly rolls two fair six-sided dice, and we must predict the outcome. Let [math]A[/math] be the value rolled on dice 1 and let [math]B[/math] be the value rolled on dice 2.

What is the probability that [math]A=2 [/math] ?

Table 1 shows the sample space of 36 outcomes. Clearly, [math]A =2 [/math] in exactly 6 of the 36 outcomes, thus [math]\operatorname{P}(A=2)=1/6[/math].

Table 1
+ B
1 2 3 4 5 6
A 1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

What is the probability [math]A+B \leq 5 [/math] ?

Table 2 shows that [math]A+B \leq 5[/math] for exactly 10 of the same 36 outcomes, thus [math]\operatorname{P}(A +B \leq 5) = 10/36[/math].

+ B
1 2 3 4 5 6
A 1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

What is the probability that [math]A = 2 [/math] given that [math]A + B \leq 5[/math] ?

Table 3 shows that for 3 of these 10 outcomes, [math]A[/math] = 2, thus the conditional probability [math]\operatorname{P}[/math]([math]A =2 | A + B \leq 5) = 3/10 [/math].

Table 3
+ B
1 2 3 4 5 6
A 1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

Use in inference

In statistical inference, the conditional probability is an update of the probability of an event based on new information.[2] Incorporating the new information can be done as follows [1]

  • Let [math]A[/math] the event of interest be in the sample space.
  • The occurrence of the event [math]A[/math] knowing that event [math]B[/math] has or will have occurred, means the occurrence of [math]A[/math] as it is restricted to [math]B[/math], i.e. [math]A \cap B[/math].
  • Without the knowledge of the occurrence of [math]B[/math], the information about the occurrence of [math]A[/math] would simply be [math]\operatorname{P}[/math]([math]A[/math])
  • The probability of [math]A[/math] knowing that event [math]B[/math] has or will have occurred, will be the probability of [math]A \cap B[/math] compared with [math]\operatorname{P}(B)[/math], the probability [math]B[/math] has occurred.
  • This results in [math]\operatorname{P}(A|B) = \operatorname{P}(A \cap B )/\operatorname{P}(B)[/math] whenever [math]\operatorname{P}(B)\gt0[/math] and 0 otherwise.

The phraseology "evidence" or "information" is generally used in the Bayesian interpretation of probability. The conditioning event is interpreted as evidence for the conditioned event. That is, [math]\operatorname{P}(A)[/math] is the probability of [math]A[/math] before accounting for evidence [math]E[/math], and [math]\operatorname{P}(A|E)[/math] is the probability of [math]A[/math] after having accounted for evidence [math]E[/math] or after having updated [math]\operatorname{P}(A)[/math].

Common fallacies

Assuming conditional probability is of similar size to its inverse

In general, it cannot be assumed that [math]\operatorname{P}[/math]([math]A[/math]|[math]B[/math]) ≈ [math]\operatorname{P}[/math]([math]B[/math]|[math]A[/math]). This can be an insidious error, even for those who are highly conversant with statistics.[3] The relationship between [math]\operatorname{P}[/math]([math]A[/math]|[math]B[/math]) and [math]\operatorname{P}[/math]([math]B[/math]|[math]A[/math]) is given by Bayes' theorem:

[[math]] \operatorname{P}(B|A) = \frac{\operatorname{P}(A|B) \operatorname{P}(B)}{\operatorname{P}(A)} \Leftrightarrow \frac{\operatorname{P}(B|A)}{\operatorname{P}(A|B)} = \frac{\operatorname{P}(B)}{\operatorname{P}(A)}. [[/math]]

That is, [math]\operatorname{P}(A|B)≈\operatorname{P}(B|A)[/math] only if [math]\operatorname{P}(B)/\operatorname{P}(A)≈1[/math], or equivalently, [math]\operatorname{P}(A)≈\operatorname{P}(B)[/math]. Alternatively, noting that [math]A \cap B = B \cap A[/math], and applying conditional probability:

[[math]]\operatorname{P}(A|B)\operatorname{P}(B) = \operatorname{P}(A \cap B) = \operatorname{P}(B \cap A) = \operatorname{P}(B|A)\operatorname{P}(A)[[/math]]

Rearranging gives the result.

Assuming marginal and conditional probabilities are of similar size

In general, it cannot be assumed that [math]\operatorname{P}(A)[/math] ≈ [math]\operatorname{P}[/math]([math]A[/math]|[math]B[/math]). These probabilities are linked through the law of total probability:

[[math]]\operatorname{P}(A) = \sum_n \operatorname{P}(A \cap B_n) = \sum_n \operatorname{P}(A|B_n)\operatorname{P}(B_n)[[/math]]

where the events [math](B_n)[/math] form a countable partition of [math]A[/math].

This fallacy may arise through selection bias.[4] For example, in the context of a medical claim, let [math]S_C[/math] be the event that a sequela (chronic disease) [math]S[/math] occurs as a consequence of circumstance (acute condition) [math]C[/math]. Let [math]H[/math] be the event that an individual seeks medical help. Suppose that in most cases, [math]C[/math] does not cause [math]S[/math] so [math]\operatorname{P}(S_{C})[/math] is low. Suppose also that medical attention is only sought if [math]S[/math] has occurred due to [math]C[/math]. From experience of patients, a doctor may therefore erroneously conclude that [math]\operatorname{P}(S_C)[/math] is high. The actual probability observed by the doctor is [math]\operatorname{P}(S_C|H)[/math].

Notes

  1. 1.0 1.1 Gut, Allan (2013). Probability: A Graduate Course (Second ed.). New York, NY: Springer. ISBN 978-1-4614-4707-8.
  2. Casella, George; Berger, Roger L. (2002). Statistical Inference. Duxbury Press. ISBN 0-534-24312-6.
  3. Paulos, J.A. (1988) Innumeracy: Mathematical Illiteracy and its Consequences, Hill and Wang. ISBN 0-8090-7447-8 (p. 63 et seq.)
  4. Thomas Bruss, F; Der Wyatt Earp Effekt; Spektrum der Wissenschaft; March 2007

References