# Exponential Smoothing

In mathematics and statistics, a stationary process (or a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Consequently, parameters such as mean and variance also do not change over time. To get an intuition of stationarity, one can imagine a frictionless pendulum. It swings back and forth in an oscillatory motion, yet the amplitude and frequency remain constant. Although the pendulum is moving, the process is stationary as its "statistics" are constant (frequency and amplitude). However, if a force were to be applied to the pendulum (for example, friction with the air), either the frequency or amplitude would change, thus making the process non-stationary.

Since stationarity is an assumption underlying many statistical procedures used in time series analysis, non-stationary data are often transformed to become stationary. The most common cause of violation of stationarity is a trend in the mean, which can be due either to the presence of a unit root or of a deterministic trend. In the former case of a unit root, stochastic shocks have permanent effects, and the process is not mean-reverting. In the latter case of a deterministic trend, the process is called a trend-stationary process, and stochastic shocks have only transitory effects after which the variable tends toward a deterministically evolving (non-constant) mean.

A trend stationary process is not strictly stationary, but can easily be transformed into a stationary process by removing the underlying trend, which is solely a function of time. Similarly, processes with one or more unit roots can be made stationary through differencing. An important type of non-stationary process that does not include a trend-like behavior is a cyclostationary process, which is a stochastic process that varies cyclically with time.

## Strict-sense stationarity

### Definition

Formally, let $\left\{X_t\right\}$ be a stochastic process and let $F_{X}(x_{t_1 + \tau}, \ldots, x_{t_n + \tau})$ represent the cumulative distribution function of the unconditional (i.e., with no reference to any particular starting value) joint distribution of $\left\{X_t\right\}$ at times $t_1 + \tau, \ldots, t_n + \tau$. Then, $\left\{X_t\right\}$ is said to be strictly stationary, strongly stationary or strict-sense stationary if:p. 155

[$]\begin{equation}\label{sss} F_{X}(x_{t_1+\tau} ,\ldots, x_{t_n+\tau}) = F_{X}(x_{t_1},\ldots, x_{t_n}) \quad \text{for all } \tau,t_1, \ldots, t_n \in \mathbb{R} \text{ and for all } n \in \mathbb{N}\end{equation}[$]

Since $\tau$ does not affect $F_X(\cdot)$, $F_{X}$ is not a function of time.

### Examples Two simulated time series processes, one stationary and the other non-stationary, are shown above. The augmented Dickey–Fuller (ADF) test statistic is reported for each process; non-stationarity cannot be rejected for the second process at a 5% significance level.

White noise is the simplest example of a stationary process.

An example of a discrete-time stationary process where the sample space is also discrete (so that the random variable may take one of N possible values) is a Bernoulli scheme. Other examples of a discrete-time stationary process with continuous sample space include some autoregressive and moving average processes which are both subsets of the autoregressive moving average model. Models with a non-trivial autoregressive component may be either stationary or non-stationary, depending on the parameter values, and important non-stationary special cases are where unit roots exist in the model.

#### Example 1

Let $Y$ be any scalar random variable, and define a time-series $\left\{X_t\right\}$, by $X_t=Y$ for all $t$. Then $\left\{X_t\right\}$ is a stationary time series, for which realisations consist of a series of constant values, with a different constant value for each realisation. A law of large numbers does not apply on this case, as the limiting value of an average from a single realisation takes the random value determined by $Y$, rather than taking the expected value of $Y$.

The time average of $X_t$ does not converge since the process is not ergodic.

#### Example 2

As a further example of a stationary process for which any single realisation has an apparently noise-free structure, let $Y$ has a uniform distribution on $(0,2\pi]$ and define the time series $\left\{X_t\right\}$ by

[$]X_t=\cos (t+Y) \quad \text{ for } t \in \mathbb{R}. [$]

Then $\left\{X_t\right\}$ is strictly stationary since ($(t+ Y)$ modulo $2 \pi$) follows the same uniform distribution as $Y$ for any $t$.

#### Example 3

Keep in mind that a white noise is not necessarily strictly stationary. Let $\omega$ be a random variable uniformly distributed in the interval $(0, 2\pi)$ and define the time series $\left\{z_t\right\}$

[$]z_t=\cos(t\omega) \quad (t=1,2,...) [$]

Then

[] \begin{align*} \mathbb{E}(z_t) &= \frac{1}{2\pi} \int_0^{2\pi} \cos(t\omega) \,d\omega = 0,\\ \operatorname{Var}(z_t) &= \frac{1}{2\pi} \int_0^{2\pi} \cos^2(t\omega) \,d\omega = 1/2,\\ \operatorname{Cov}(z_t , z_j) &= \frac{1}{2\pi} \int_0^{2\pi} \cos(t\omega)\cos(j\omega) \,d\omega = 0 \quad \forall t\neq j. \end{align*} []

So $\{z_t\}$ is a white noise, however it is not strictly stationary.

## $N$th-order stationarity

In $\ref{sss}$, the distribution of $n$ samples of the stochastic process must be equal to the distribution of the samples shifted in time for all $n$. $N$th order stationarity is a weaker form of stationarity where this is only requested for all $n$ up to a certain order $N$. A random process $\left\{X_t\right\}$ is said to be $N$th order stationary if::p. 152

[$] F_{X}(x_{t_1+\tau} ,\ldots, x_{t_n+\tau}) = F_{X}(x_{t_1},\ldots, x_{t_n}) \quad \text{for all } \tau,t_1, \ldots, t_n \in \mathbb{R} \text{ and for all } n \in \{1,\ldots,N\} [$]

## Weak or wide-sense stationarity

### Definition

A weaker form of stationarity commonly employed in signal processing is known as weak-sense stationarity, wide-sense stationarity (WSS), or covariance stationarity. WSS random processes only require that 1st moment (i.e. the mean) and autocovariance do not vary with respect to time and that the 2nd moment is finite for all times. Any strictly stationary process which has a finite mean and a covariance is also WSS.:p. 299

So, a continuous time random process $\left\{X_t\right\}$ which is WSS has the following restrictions on its mean function $m_X(t) \triangleq \operatorname E[X_t]$ and autocovariance function $K_{XX}(t_1, t_2) \triangleq \operatorname E[(X_{t_1}-m_X(t_1))(X_{t_2}-m_X(t_2))]$:

[] \begin{align*} & m_X(t) = m_X(t + \tau) & & \text{for all } \tau \in \mathbb{R} \\ & K_{XX}(t_1, t_2) = K_{XX}(t_1 - t_2, 0) & & \text{for all } t_1,t_2 \in \mathbb{R} \\ & \operatorname E[|X(t)|^2] \lt \infty & & \text{for all } t \in \mathbb{R} \end{align*} []

The first property implies that the mean function $m_X(t)$ must be constant. The second property implies that the covariance function depends only on the difference between $t_1$ and $t_2$ and only needs to be indexed by one variable rather than two variables.:p. 159 Thus, instead of writing,

[$]\,\!K_{XX}(t_1 - t_2, 0)\,[$]

the notation is often abbreviated by the substitution $\tau = t_1 - t_2$:

[$]K_{XX}(\tau) \triangleq K_{XX}(t_1 - t_2, 0)[$]

This also implies that the autocorrelation depends only on $\tau = t_1 - t_2$, that is

[$]\,\! R_X(t_1,t_2) = R_X(t_1-t_2,0) \triangleq R_X(\tau).[$]

The third property says that the second moments must be finite for any time $t$.

## Differencing

One way to make some time series stationary is to compute the differences between consecutive observations. This is known as differencing. Differencing can help stabilize the mean of a time series by removing changes in the level of a time series, and so eliminating trends. This can also remove seasonality, if differences are taken appropriately (e.g. differencing observations 1 year apart to remove year-lo).

Transformations such as logarithms can help to stabilize the variance of a time series.

One of the ways for identifying non-stationary times series is the ACF plot. Sometimes, seasonal patterns will be more visible in the ACF plot than in the original time series; however, this is not always the case. Nonstationary time series can look stationary

Another approach to identifying non-stationarity is to look at the Laplace transform of a series, which will identify both exponential trends and sinusoidal seasonality (complex exponential trends). Related techniques from signal analysis such as the wavelet transform and Fourier transform may also be helpful.

Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

Different fields of study define autocorrelation differently, and not all of these definitions are equivalent. In some fields, the term is used interchangeably with autocovariance.

Unit root processes, trend-stationary processes, autoregressive processes, and moving average processes are specific forms of processes with autocorrelation.

## Autocorrelation of stochastic processes

In statistics, the autocorrelation of a real or complex random process is the Pearson correlation between values of the process at different times, as a function of the two times or of the time lag. Let $\left\{ X_t \right\}$ be a random process, and $t$ be any point in time ($t$ may be an integer for a discrete-time process or a real number for a continuous-time process). Then $X_t$ is the value (or realization) produced by a given run of the process at time $t$. Suppose that the process has mean $\mu_t$ and variance $\sigma_t^2$ at time $t$, for each $t$. Then the definition of the auto-correlation function between times $t_1$ and $t_2$ is:p.388:p.165

$\operatorname{R}_{XX}(t_1,t_2) = \operatorname{E} \left[ X_{t_1} \overline{X}_{t_2}\right]$

where $\operatorname{E}$ is the expected value operator and the bar represents complex conjugation. Note that the expectation may not be well defined.

Subtracting the mean before multiplication yields the auto-covariance function between times $t_1$ and $t_2$::p.392:p.168

[$]\operatorname{K}_{XX}(t_1,t_2) = \operatorname{E} \left[ (X_{t_1} - \mu_{t_1})\overline{(X_{t_2} - \mu_{t_2})} \right] = \operatorname{E}\left[X_{t_1} \overline{X}_{t_2} \right] - \mu_{t_1}\overline{\mu}_{t_2}[$]

Note that this expression is not well defined for all time series or processes, because the mean may not exist, or the variance may be zero (for a constant process) or infinite (for processes with distribution lacking well-behaved moments, such as certain types of power law).

### Definition for wide-sense stationary stochastic process

If $\left\{ X_t \right\}$ is a wide-sense stationary process then the mean $\mu$ and the variance $\sigma^2$ are time-independent, and further the autocovariance function depends only on the lag between $t_1$ and $t_2$: the autocovariance depends only on the time-distance between the pair of values but not on their position in time. This further implies that the autocovariance and auto-correlation can be expressed as a function of the time-lag, and that this would be an even function of the lag $\tau=t_2-t_1$. This gives the more familiar forms for the auto-correlation function:p.395

[$]\operatorname{R}_{XX}(\tau) = \operatorname{E}\left[X_{t+\tau} \overline{X}_{t} \right][$]

and the auto-covariance function:

[$]\operatorname{K}_{XX}(\tau) = \operatorname{E}\left[ (X_{t+\tau} - \mu)\overline{(X_{t} - \mu)} \right] = \operatorname{E} \left[ X_{t+\tau} \overline{X}_{t} \right] - \mu\overline{\mu}[$]

### Normalization

It is common practice in some disciplines (e.g. statistics and time series analysis) to normalize the autocovariance function to get a time-dependent Pearson correlation coefficient. However, in other disciplines (e.g. engineering) the normalization is usually dropped and the terms "autocorrelation" and "autocovariance" are used interchangeably.

The definition of the auto-correlation coefficient of a stochastic process is:p.169

[$]\rho_{XX}(t_1,t_2) = \frac{\operatorname{K}_{XX}(t_1,t_2)}{\sigma_{t_1}\sigma_{t_2}} = \frac{\operatorname{E}\left[(X_{t_1} - \mu_{t_1}) \overline{(X_{t_2} - \mu_{t_2})} \right]}{\sigma_{t_1}\sigma_{t_2}} .[$]

If the function $\rho_{XX}$ is well defined, its value must lie in the range $[-1,1]$, with 1 indicating perfect correlation and −1 indicating perfect anti-correlation.

For a weak-sense stationarity, wide-sense stationarity (WSS) process, the definition is

[$]\rho_{XX}(\tau) = \frac{\operatorname{K}_{XX}(\tau)}{\sigma^2} = \frac{\operatorname{E} \left[(X_{t+\tau} - \mu)\overline{(X_{t} - \mu)}\right]}{\sigma^2}[$]

where

[$]\operatorname{K}_{XX}(0) = \sigma^2 .[$]

The normalization is important both because the interpretation of the autocorrelation as a correlation provides a scale-free measure of the strength of statistical dependence, and because the normalization has an effect on the statistical properties of the estimated autocorrelations.

### Properties

#### Symmetry property

The fact that the auto-correlation function $\operatorname{R}_{XX}$ is an even function can be stated as:p.171

[$]\operatorname{R}_{XX}(t_1,t_2) = \overline{\operatorname{R}_{XX}(t_2,t_1)}[$]

respectively for a WSS process::p.173

[$]\operatorname{R}_{XX}(\tau) = \overline{\operatorname{R}_{XX}(-\tau)} .[$]

#### Maximum at zero

For a WSS process::p.174

[$]\left|\operatorname{R}_{XX}(\tau)\right| \leq \operatorname{R}_{XX}(0)[$]

Notice that $\operatorname{R}_{XX}(0)$ is always real.

#### Cauchy–Schwarz inequality

The Cauchy–Schwarz inequality, inequality for stochastic processes::p.392

[$]\left|\operatorname{R}_{XX}(t_1,t_2)\right|^2 \leq \operatorname{E}\left[ |X_{t_1}|^2\right] \operatorname{E}\left[|X_{t_2}|^2\right][$]

#### Wiener–Khinchin theorem

The Wiener–Khinchin theorem relates the autocorrelation function $\operatorname{R}_{XX}$ to the power spectral density $S_{XX}$ via the Fourier transform:

[$]\operatorname{R}_{XX}(\tau) = \int_{-\infty}^\infty S_{XX}(f) e^{i 2 \pi f \tau} \, {\rm d}f[$]

[$]S_{XX}(f) = \int_{-\infty}^\infty \operatorname{R}_{XX}(\tau) e^{- i 2 \pi f \tau} \, {\rm d}\tau .[$]

For real-valued functions, the symmetric autocorrelation function has a real symmetric transform, so the Wiener–Khinchin theorem can be re-expressed in terms of real cosines only:

[$]\operatorname{R}_{XX}(\tau) = \int_{-\infty}^\infty S_{XX}(f) \cos(2 \pi f \tau) \, {\rm d}f[$]

[$]S_{XX}(f) = \int_{-\infty}^\infty \operatorname{R}_{XX}(\tau) \cos(2 \pi f \tau) \, {\rm d}\tau .[$]