# Transformations

If $X$ is a random variable with cumulative distribution function $F_X$, we may produce other random variables by applying a transformation of $X$ of the form $g(X)$ for suitable functions g. These transformations are used often in probability and statistics since it is often the case that the transformation of random variables yields new random variables that have desirable properties and such properties can yield results pertaining to the original set of random variables. In this page, we are mainly concerned with computing the probability distribution of the transformation in terms of the probability distribution of the original random variable.

## Linear Transformations

We first consider the simplest possible transformation: the linear transformation. If $a$ and $b$ are real numbers, then we may consider the random variable

[$] \begin{equation} Y = T(X) = aX + b. \end{equation} [$]

If $a$ is zero then there isn't anything to discuss since the transformation is just the constant $b$, so we may assume that $a$ is non-zero.

### a > 0

If $a$ is positive then $T$ is a strictly increasing function and we have:

[] \begin{align} F_{Y}(y) = \operatorname{P}(aX + b \leq y ) = \operatorname{P}(X \leq a^{-1}(y - b)) &= F_{X}[a^{-1}(y - b)] \end{align} []

### a < 0 and $X$ continuous

If $a$ is negative and $X$ is a continuous random variable, then $T$ is a strictly decreasing function and we have:

[] \begin{align} F_{Y}(y) = \operatorname{P}(aX + b \leq y ) = \operatorname{P}(X \geq a^{-1}(y - b)) &= 1 - F_{X}[a^{-1}(y - b)] . \end{align} []

## Monotone Transformations

### Strictly Increasing

Suppose that the transformation, denoted by $T$, is a transformation that is strictly increasing:

[$] x_1 \lt x_2 \implies T(x_1) \lt T(x_2). [$]

We denote by $T^{-1}$ the unique transformation with the property

[$] T^{-1}(T(x)) = x \Longleftrightarrow I = T^{-1}\circ T [$]

with $I$ the identity function. Following the approach for linear transformations, we have

[] \begin{align} \operatorname{P}[T(X) \leq y ] = \operatorname{P}[X \leq T^{-1}(y)] = F_{X}[T^{-1}(y)]. \end{align} []

Thus we have the following simple relation:

[$] \begin{equation} \label{transform-rel-up} X \mapsto T(X)=Y \implies F_Y = F_{X} \circ T^{-1}. \end{equation} [$]

### Strictly Decreasing and X Continuous

If the transformation $T$ is strictly decreasing and $F_X$ is continuous, then

[$] \operatorname{P}[T(X)\leq y ] = \operatorname{P}[X \geq T^{-1}(y)] = 1 - F_{X}[T^{-1}(y)] [$]

and thus

[$] \begin{equation} \label{transform-rel-down} X \mapsto T(X)=Y \implies F_Y = 1 - F_{X} \circ T^{-1}. \end{equation} [$]

### Probability Density Functions

If the cumulative distribution function $F_X$ has a density say $f_X$, then we see from \ref{transform-rel-up} and \ref{transform-rel-down} that the following relation holds:

[$] \begin{equation} \label{monotone-density-relation} X \mapsto T(X)=Y \implies f_Y = \frac{ f_{X} \circ T^{-1}}{\left |T^{\prime} \circ T^{-1} \right |} \, \cdot \, \end{equation} [$]

To be precise, relation \ref{monotone-density-relation} is true when the following integrability condition holds: $\int_{0}^{\infty} f_{Y}(y) \, dy \lt \infty.$

### Example: Exponentiation

Consider the transformation $T(x) = \exp(x)$. By \ref{transform-rel-up}, we have $F_Y(y) = F_{X}(\ln(y))$. If $F_X$ has a density $f_X$ then, by \ref{monotone-density-relation},

[$] f_{Y}(y) = \frac{ f_{X}(\ln(y))}{y}[$]

provided that

[$] \int_{0}^{\infty} \frac{ f_{X}(\ln(y))}{y} \, dy \lt \infty.[$]

For instance, let $X$ be a random variable with a standard normal distribution:

[$] f_{X}(x) = \frac{1}{\sqrt{2\pi}} e^{-x/2}. [$]

The random variable $\exp(X)$ is said to have a lognormal distribution. It is fairly easy to show that

[] \begin{align*} \int_{0}^{\infty} \frac{\exp{[-\ln(y)^2]}}{y} \lt \infty \end{align*} []

and thus the density for the lognormal distribution is given by

[$] f_{Y}(y) = \frac{1}{2\pi} \frac{\exp{[-\ln(y)^2]}}{y}. [$]

## General Case

For a general transformation $T$ where $Y = T(X)$, there is no simple and explicit relation between $F_X$ and $F_Y$. That being said, there are situations when we can use conditioning as well as the relations we've already derived to compute the distribution of $Y$. More precisely, given a partition (splitting up) of the real line

[$] -\infty \leq a_0 \lt a_1 \lt \cdots \lt a_n \leq \infty \quad ( 0 = F_X(a_0) \lt F_X(a_1) \leq F_X(a_n) = 1) [$]

, we let $X_i$ denote a random variable with distribution equal to the conditional distribution of $X$ given that $X$ lies in the interval $(a_{i-1},a_i]$ and let $Y_i = T(X_i)$. Then we have

[] \begin{align} F_{Y}(x) &= \sum_{i=1}^n F_{Y_i}(x) \operatorname{P}[a_{i} \lt X \leq a_{i+1}] \\ \label{gen-case-cdf-transform} &= \sum_{i=1}^n F_{Y_i}(x) [F_X(a_{i+1}) - F_X(a_i)]. \end{align} []

If the partition is chosen in such a way that $T$ -- when applied to any of the $X_i$ -- satisfies a property that we have encountered in previous sections (linear or monotone), then we can use \ref{gen-case-cdf-transform} to derive a relatively simple expression for the distribution function of $Y$.

### Example: Squaring

Let $T(x) = x^2$ and suppose that $F_X \gt 0$. Then we set

[$] a_0 = -\infty, a_1 = 0, a_2 = \infty. [$]

If $p = F_X(0)$ and, recalling \ref{gen-case-cdf-transform}, we obtain

[$] \begin{equation} \label{square-1} F_Y(y) = F_{Y_1}(y)\cdot p + F_{Y_2}(y) \cdot (1-p). \end{equation} [$]

By \ref{transform-rel-up} and \ref{transform-rel-down}, we know that

[$] \begin{equation} \label{square-2} F_{Y_1}(y) =1 - F_{X_1}\left(-\sqrt{y}\right) = \operatorname{P}[X \geq -\sqrt{y} | X \leq 0] \end{equation} [$]

and

[$] \begin{equation} \label{square-3} F_{Y_2}(y) = F_{X_2}\left(\sqrt{y}\right) = \operatorname{P}[X \leq \sqrt{y} | X \gt 0]. \end{equation} [$]

Combining \ref{square-1},\ref{square-2} and \ref{square-3}, we finally obtain the relation

[$] \begin{equation} \label{square-final} X \mapsto Y=X^2 \implies F_{Y}(y) = F_{X}\left(\sqrt{y}\right) - F_{X}\left(-\sqrt{y}\right). \end{equation} [$]

The purpose of this simple example was to demonstrate the use of the conditioning method. A simpler and more direct approach would have also worked:

[$] F_Y(y) = \operatorname{P}[X^2 \leq y] =\operatorname{P}[\left|X\right| \leq \sqrt{y}] = F_{X}\left(\sqrt{y}\right) - F_{X}\left(-\sqrt{y}\right). [$]

For $y$ not equal to zero, the derivative of $F_Y$ equals

[$] \begin{equation} \frac{1}{2\sqrt{y}}[f_{X}(\sqrt{y}) + f_{X}(-\sqrt{y})]. \end{equation} [$]

Therefore we obtain the following relation:

[$] \begin{equation} \label{square-final-density} X \mapsto Y=X^2 \implies f_{Y}(y) = \frac{1}{2\sqrt{y}}[f_{X}(\sqrt{y}) + f_{X}(-\sqrt{y})] \end{equation} [$]

provided that

[$] \begin{equation} \label{square-density-condition} \int_0^{\infty}\frac{1}{2\sqrt{y}}[f_{X}(\sqrt{y}) + f_{X}(-\sqrt{y})] \, dy \lt \infty. \end{equation} [$]

To demonstrate this technique, consider squaring a random variable $X$ with a standard normal distribution. The integrability condition \ref{square-density-condition} is equivalent to

[$] \int_0^{\infty}\frac{1}{\sqrt{y}}e^{-y} \, dy \lt \infty [$]

which is definitely true. The distribution of $X^2$ is a chi-square distribution with 1 degree of freedom, and its density equals

[$] \frac{1}{\sqrt{2\pi}} \frac{1}{\sqrt{y}} e^{-y}. [$]