Delta Method

The delta method is a result concerning the approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator.

Method

While the delta method generalizes easily to a multivariate setting, careful motivation of the technique is more easily demonstrated in univariate terms. Roughly, if there is a sequence of random variables [math]X_n[/math] satisfying

[[math]]{\sqrt{n}[X_n-\theta]\,\xrightarrow{D}\,\mathcal{N}(0,\sigma^2)},[[/math]]

where [math]\theta[/math] and [math]\sigma^2[/math] are finite valued constants and [math]\xrightarrow{D}[/math] denotes convergence in distribution, then

[[math]] {\sqrt{n}[g(X_n)-g(\theta)]\,\xrightarrow{D}\,\mathcal{N}(0,\sigma^2[g'(\theta)]^2)} [[/math]]

for any function [math]g[/math] satisfying the property that [math]g'(\theta) [/math] exists and is non-zero valued.


The method extends to the multivariate case. By definition, a consistent estimator [math]B[/math] converges in probability to its true value [math]\beta[/math], and often a central limit theorem can be applied to obtain asymptotic normality:

[[math]]\sqrt{n} (B-\beta )\,\xrightarrow{D}\,\mathcal{N}(0, \Sigma ),[[/math]]

where n is the number of observations and [math]\Sigma[/math] is a covariance matrix. The multivariate delta method yields the following asymptotic property of a function [math]h[/math] of the estimator [math]B[/math] under the assumption that the gradient [math]\nabla h[/math] is non-zero:

Proposition (Multivariate delta method)

[[math]]\sqrt{n}(h(B)-h(\beta))\,\xrightarrow{D}\,\mathcal{N}(0, \nabla h(\beta)^T \cdot \Sigma \cdot \nabla h(\beta)).[[/math]]

Show Proof

We start with the univariate case. Demonstration of this result is fairly straightforward under the assumption that [math]g'(\theta)[/math] is continuous. To begin, we use the mean value theorem:

[[math]]g(X_n)=g(\theta)+g'(\tilde{\theta})(X_n-\theta),[[/math]]

where [math]\tilde{\theta}[/math] lies between [math]X_n[/math] and [math]\theta[/math]. Note that since [math]X_n\,\xrightarrow{P}\,\theta[/math] and [math]X_n \lt \tilde{\theta} \lt \theta [/math], it must be that [math]\tilde{\theta} \,\xrightarrow{P}\,\theta[/math] and since [math]g'(\theta)[/math] is continuous, applying the continuous mapping theorem yields

[[math]]g'(\tilde{\theta})\,\xrightarrow{P}\,g'(\theta),[[/math]]

where [math]\xrightarrow{P}[/math] denotes convergence in probability. Rearranging the terms and multiplying by [math]\sqrt{n}[/math] gives

[[math]]\sqrt{n}[g(X_n)-g(\theta)]=g'(\tilde{\theta})\sqrt{n}[X_n-\theta].[[/math]]
Since
[[math]]{\sqrt{n}[X_n-\theta] \xrightarrow{D} \mathcal{N}(0,\sigma^2)}[[/math]]

by assumption, it follows immediately from appeal to Slutsky's Theorem that

[[math]]{\sqrt{n}[g(X_n)-g(\theta)] \xrightarrow{D} \mathcal{N}(0,\sigma^2[g'(\theta)]^2)}.[[/math]]


Now we proceed to the multivariate case. Keeping only the first two terms of the Taylor series, and using vector notation for the gradient, we can estimate [math]h(B)[/math] as

[[math]]h(B) \approx h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)[[/math]]

which implies the variance of h(B) is approximately

[[math]]\begin{align*} \operatorname{Var}(h(B)) & \approx \operatorname{Var}(h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)) \\ & = \operatorname{Var}(h(\beta) + \nabla h(\beta)^T \cdot B - \nabla h(\beta)^T \cdot \beta) \\ & = \operatorname{Var}(\nabla h(\beta)^T \cdot B) \\ & = \nabla h(\beta)^T \cdot \operatorname{Cov}(B) \cdot \nabla h(\beta) \\ & = \nabla h(\beta)^T \cdot (\Sigma / n) \cdot \nabla h(\beta) \end{align*} [[/math]]

One can use the mean value theorem (for real-valued functions of many variables) to see that this does not rely on taking first order approximation.

The univariate delta method therefore implies that

[[math]]\sqrt{n}(h(B)-h(\beta))\,\xrightarrow{D}\,\mathcal{N}(0, \nabla h(\beta)^T \cdot \Sigma \cdot \nabla h(\beta)).[[/math]]


References

  • Wikipedia contributors. "Delta method". Wikipedia. Wikipedia. Retrieved 30 May 2019.