# Conditional Probability

In probability theory, conditional probability is a measure of the probability of an event given that (by assumption, presumption, assertion or evidence) another event has occurred. If the event of interest is $A$ and the event $B$ is known or assumed to have occurred, "the conditional probability of $A$ given $B$", or "the probability of $A$ under the condition $B$", is usually written as $\operatorname{P}$($A$|$B$). For example, the probability that any given person has a cough on any given day may be only 5%. But if we know or assume that the person has a cold, then they are much more likely to be coughing. The conditional probability of coughing given that you have a cold might be a much higher 75%.

## Definition

Given two events $A$ and $B$ from the sigma-field of a probability space with $\operatorname{P}(B)\gt0$, the conditional probability of $A$ given $B$ is defined as the quotient of the probability of the joint of events $A$ and $B$, and the probability of $B$:

[$]\operatorname{P}(A|B) = \frac{\operatorname{P}(A \cap B)}{\operatorname{P}(B)}[$]

This may be visualized as restricting the sample space to $B$. The logic behind this equation is that if the outcomes are restricted to $B$, this set serves as the new sample space.

Note that this is a definition but not a theoretical result. We just denote the quantity $\operatorname{P}(A\cap B)/\operatorname{P}(B)$ as $\operatorname{P}(A|B)$ and call it the conditional probability of $A$ given $B$.

## Example

Suppose that somebody secretly rolls two fair six-sided dice, and we must predict the outcome. Let $A$ be the value rolled on dice 1 and let $B$ be the value rolled on dice 2.

### What is the probability that $A=2$ ?

Table 1 shows the sample space of 36 outcomes. Clearly, $A =2$ in exactly 6 of the 36 outcomes, thus $\operatorname{P}(A=2)=1/6$.

Table 1
+ B
1 2 3 4 5 6
A 1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

### What is the probability $A+B \leq 5$ ?

Table 2 shows that $A+B \leq 5$ for exactly 10 of the same 36 outcomes, thus $\operatorname{P}(A +B \leq 5) = 10/36$.

+ B
1 2 3 4 5 6
A 1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

### What is the probability that $A = 2$ given that $A + B \leq 5$ ?

Table 3 shows that for 3 of these 10 outcomes, $A$ = 2, thus the conditional probability $\operatorname{P}$($A =2 | A + B \leq 5) = 3/10$.

Table 3
+ B
1 2 3 4 5 6
A 1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

## Use in inference

In statistical inference, the conditional probability is an update of the probability of an event based on new information. Incorporating the new information can be done as follows 

• Let $A$ the event of interest be in the sample space.
• The occurrence of the event $A$ knowing that event $B$ has or will have occurred, means the occurrence of $A$ as it is restricted to $B$, i.e. $A \cap B$.
• Without the knowledge of the occurrence of $B$, the information about the occurrence of $A$ would simply be $\operatorname{P}$($A$)
• The probability of $A$ knowing that event $B$ has or will have occurred, will be the probability of $A \cap B$ compared with $\operatorname{P}(B)$, the probability $B$ has occurred.
• This results in $\operatorname{P}(A|B) = \operatorname{P}(A \cap B )/\operatorname{P}(B)$ whenever $\operatorname{P}(B)\gt0$ and 0 otherwise.

The phraseology "evidence" or "information" is generally used in the Bayesian interpretation of probability. The conditioning event is interpreted as evidence for the conditioned event. That is, $\operatorname{P}(A)$ is the probability of $A$ before accounting for evidence $E$, and $\operatorname{P}(A|E)$ is the probability of $A$ after having accounted for evidence $E$ or after having updated $\operatorname{P}(A)$.

## Common fallacies

### Assuming conditional probability is of similar size to its inverse

In general, it cannot be assumed that $\operatorname{P}$($A$|$B$) ≈ $\operatorname{P}$($B$|$A$). This can be an insidious error, even for those who are highly conversant with statistics. The relationship between $\operatorname{P}$($A$|$B$) and $\operatorname{P}$($B$|$A$) is given by Bayes' theorem:

[$] \operatorname{P}(B|A) = \frac{\operatorname{P}(A|B) \operatorname{P}(B)}{\operatorname{P}(A)} \Leftrightarrow \frac{\operatorname{P}(B|A)}{\operatorname{P}(A|B)} = \frac{\operatorname{P}(B)}{\operatorname{P}(A)}. [$]

That is, $\operatorname{P}(A|B)≈\operatorname{P}(B|A)$ only if $\operatorname{P}(B)/\operatorname{P}(A)≈1$, or equivalently, $\operatorname{P}(A)≈\operatorname{P}(B)$. Alternatively, noting that $A \cap B = B \cap A$, and applying conditional probability:

[$]\operatorname{P}(A|B)\operatorname{P}(B) = \operatorname{P}(A \cap B) = \operatorname{P}(B \cap A) = \operatorname{P}(B|A)\operatorname{P}(A)[$]

Rearranging gives the result.

### Assuming marginal and conditional probabilities are of similar size

In general, it cannot be assumed that $\operatorname{P}(A)$ ≈ $\operatorname{P}$($A$|$B$). These probabilities are linked through the law of total probability:

[$]\operatorname{P}(A) = \sum_n \operatorname{P}(A \cap B_n) = \sum_n \operatorname{P}(A|B_n)\operatorname{P}(B_n)[$]

where the events $(B_n)$ form a countable partition of $A$.

This fallacy may arise through selection bias. For example, in the context of a medical claim, let $S_C$ be the event that a sequela (chronic disease) $S$ occurs as a consequence of circumstance (acute condition) $C$. Let $H$ be the event that an individual seeks medical help. Suppose that in most cases, $C$ does not cause $S$ so $\operatorname{P}(S_{C})$ is low. Suppose also that medical attention is only sought if $S$ has occurred due to $C$. From experience of patients, a doctor may therefore erroneously conclude that $\operatorname{P}(S_C)$ is high. The actual probability observed by the doctor is $\operatorname{P}(S_C|H)$.