# Conditional Probability

In probability theory, conditional probability is a measure of the probability of an event given that (by assumption, presumption, assertion or evidence) another event has occurred.[1] If the event of interest is $A$ and the event $B$ is known or assumed to have occurred, "the conditional probability of $A$ given $B$", or "the probability of $A$ under the condition $B$", is usually written as $\operatorname{P}$($A$|$B$). For example, the probability that any given person has a cough on any given day may be only 5%. But if we know or assume that the person has a cold, then they are much more likely to be coughing. The conditional probability of coughing given that you have a cold might be a much higher 75%.

## Definition

Given two events $A$ and $B$ from the sigma-field of a probability space with $\operatorname{P}(B)\gt0$, the conditional probability of $A$ given $B$ is defined as the quotient of the probability of the joint of events $A$ and $B$, and the probability of $B$:

[$]\operatorname{P}(A|B) = \frac{\operatorname{P}(A \cap B)}{\operatorname{P}(B)}[$]

This may be visualized as restricting the sample space to $B$. The logic behind this equation is that if the outcomes are restricted to $B$, this set serves as the new sample space.

Note that this is a definition but not a theoretical result. We just denote the quantity $\operatorname{P}(A\cap B)/\operatorname{P}(B)$ as $\operatorname{P}(A|B)$ and call it the conditional probability of $A$ given $B$.

## Example

Suppose that somebody secretly rolls two fair six-sided dice, and we must predict the outcome. Let $A$ be the value rolled on dice 1 and let $B$ be the value rolled on dice 2.

### What is the probability that $A=2$ ?

Table 1 shows the sample space of 36 outcomes. Clearly, $A =2$ in exactly 6 of the 36 outcomes, thus $\operatorname{P}(A=2)=1/6$.

Table 1
+ B
1 2 3 4 5 6
A 1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

### What is the probability $A+B \leq 5$ ?

Table 2 shows that $A+B \leq 5$ for exactly 10 of the same 36 outcomes, thus $\operatorname{P}(A +B \leq 5) = 10/36$.

+ B
1 2 3 4 5 6
A 1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

### What is the probability that $A = 2$ given that $A + B \leq 5$ ?

Table 3 shows that for 3 of these 10 outcomes, $A$ = 2, thus the conditional probability $\operatorname{P}$($A =2 | A + B \leq 5) = 3/10$.

Table 3
+ B
1 2 3 4 5 6
A 1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

## Use in inference

In statistical inference, the conditional probability is an update of the probability of an event based on new information.[2] Incorporating the new information can be done as follows [1]

• Let $A$ the event of interest be in the sample space.
• The occurrence of the event $A$ knowing that event $B$ has or will have occurred, means the occurrence of $A$ as it is restricted to $B$, i.e. $A \cap B$.
• Without the knowledge of the occurrence of $B$, the information about the occurrence of $A$ would simply be $\operatorname{P}$($A$)
• The probability of $A$ knowing that event $B$ has or will have occurred, will be the probability of $A \cap B$ compared with $\operatorname{P}(B)$, the probability $B$ has occurred.
• This results in $\operatorname{P}(A|B) = \operatorname{P}(A \cap B )/\operatorname{P}(B)$ whenever $\operatorname{P}(B)\gt0$ and 0 otherwise.

The phraseology "evidence" or "information" is generally used in the Bayesian interpretation of probability. The conditioning event is interpreted as evidence for the conditioned event. That is, $\operatorname{P}(A)$ is the probability of $A$ before accounting for evidence $E$, and $\operatorname{P}(A|E)$ is the probability of $A$ after having accounted for evidence $E$ or after having updated $\operatorname{P}(A)$.

## Common fallacies

### Assuming conditional probability is of similar size to its inverse

In general, it cannot be assumed that $\operatorname{P}$($A$|$B$) ≈ $\operatorname{P}$($B$|$A$). This can be an insidious error, even for those who are highly conversant with statistics.[3] The relationship between $\operatorname{P}$($A$|$B$) and $\operatorname{P}$($B$|$A$) is given by Bayes' theorem:

[$] \operatorname{P}(B|A) = \frac{\operatorname{P}(A|B) \operatorname{P}(B)}{\operatorname{P}(A)} \Leftrightarrow \frac{\operatorname{P}(B|A)}{\operatorname{P}(A|B)} = \frac{\operatorname{P}(B)}{\operatorname{P}(A)}. [$]

That is, $\operatorname{P}(A|B)≈\operatorname{P}(B|A)$ only if $\operatorname{P}(B)/\operatorname{P}(A)≈1$, or equivalently, $\operatorname{P}(A)≈\operatorname{P}(B)$. Alternatively, noting that $A \cap B = B \cap A$, and applying conditional probability:

[$]\operatorname{P}(A|B)\operatorname{P}(B) = \operatorname{P}(A \cap B) = \operatorname{P}(B \cap A) = \operatorname{P}(B|A)\operatorname{P}(A)[$]

Rearranging gives the result.

### Assuming marginal and conditional probabilities are of similar size

In general, it cannot be assumed that $\operatorname{P}(A)$ ≈ $\operatorname{P}$($A$|$B$). These probabilities are linked through the law of total probability:

[$]\operatorname{P}(A) = \sum_n \operatorname{P}(A \cap B_n) = \sum_n \operatorname{P}(A|B_n)\operatorname{P}(B_n)[$]

where the events $(B_n)$ form a countable partition of $A$.

This fallacy may arise through selection bias.[4] For example, in the context of a medical claim, let $S_C$ be the event that a sequela (chronic disease) $S$ occurs as a consequence of circumstance (acute condition) $C$. Let $H$ be the event that an individual seeks medical help. Suppose that in most cases, $C$ does not cause $S$ so $\operatorname{P}(S_{C})$ is low. Suppose also that medical attention is only sought if $S$ has occurred due to $C$. From experience of patients, a doctor may therefore erroneously conclude that $\operatorname{P}(S_C)$ is high. The actual probability observed by the doctor is $\operatorname{P}(S_C|H)$.

## Notes

1. Gut, Allan (2013). Probability: A Graduate Course (Second ed.). New York, NY: Springer. ISBN 978-1-4614-4707-8.
2. Casella, George; Berger, Roger L. (2002). Statistical Inference. Duxbury Press. ISBN 0-534-24312-6.
3. Paulos, J.A. (1988) Innumeracy: Mathematical Illiteracy and its Consequences, Hill and Wang. ISBN 0-8090-7447-8 (p. 63 et seq.)
4. Thomas Bruss, F; Der Wyatt Earp Effekt; Spektrum der Wissenschaft; March 2007