Statistical Models

Mack-Method

The Mack chain ladder method is a statistical method to estimate developmental factors in the chain ladder method. The method assumes the following:

Distinct rows of the array/matrix [math]C_{i,j}[/math] are independent.
[math]\operatorname{E}[C_{i,j+1} | C_{i,0},\ldots,C_{i,j}] = f_j C_{i,j}[/math] with [math]f_j[/math] a constant.
[math]\operatorname{Var}[C_{i,j+1} | C_{i,0},\ldots,C_{i,j}] = \sigma_{j}^2 C_{i,j}[/math] with [math]\sigma_j[/math] a constant.

The goal of the Mack-method is to estimate the factors [math]f_j[/math] using the observable [math]C_{i,j}[/math]. The estimators, denoted [math]\hat{f}_j[/math], then become selected age-to-age factors. The estimators are defined as follows:

[[math]] \begin{equation} \hat{f}_j = \frac{\sum_{i=0}^{I-j}C_{i,j+1}}{\sum_{i=0}^{I-j}C_{i,j}}. \end{equation} [[/math]]

They have the following desirable properties:

Proposition (properties of estimators [math]\hat{f}_j[/math])

[math]\hat{f_j}[/math] is an unbiased estimator for [math]f_j[/math]: [math]\operatorname{E}[\hat{f_j}] = f_j[/math].
The estimator [math]\hat{f_j}[/math] is a minimum variance estimator in the following sense:
[[math]] \hat{f_j} = \underset{X \in S_j}{\operatorname{argmin}} \operatorname{Var}[ X | A_j ],\, S_j = \{\sum_{i=0}^{I-j}w_i C_{i,j+1}/C_{i,j} | \sum_{i=0}^{I-j}w_i = 1\} [[/math]]
with [math]A_j = \cup_{i=0}^{I-j}\{C_{i,0},\ldots,C_{i,j}\} [/math] the claims information contained in the first [math]j[/math] periods.

Show Proof

We first prove an intermediate result. Suppose [math]X_1,\ldots,X_n[/math] is a sequence of uncorrelated random variables. Among all the convex linear combinations [math]\sum_{i=1}^n\alpha_iX_i[/math], the variance minimizing combination has coefficients

[[math]] \begin{equation} \label{minvarcor} \alpha_i = \frac{1}{\operatorname{Var}[X_i] \sum_{i=1}^n \operatorname{Var}[X_i]^{-1}}. \end{equation} [[/math]]

The proof of this result is a simple application of the Cauchy-Scharz inequality:

[[math]] \operatorname{Var}[\sum_{i=1}^n\alpha_iX_i] = \sum_{i=1}^n \alpha_i^2 \operatorname{Var}[X_i] \geq \left( \sum_{i=1}^n\operatorname{Var}[X_i]^{-1}\right)^{-1} \sum_{i=1}^n \alpha_i = \left( \sum_{i=1}^n\operatorname{Var}[X_i]^{-1}\right)^{-1} [[/math]]

with equality if and only if \ref{minvarcor} holds. Applying this intermediate result to the independent random variables [math]C_{i,k+1}/C_{i,k} [/math] for [math] i = 0,\ldots,I - k-1[/math] with respect to the conditional variance, we obtain the estimator

[[math]] \begin{align*} \hat{f_{k}} &= \sum_{i=0}^{I - k-1} \frac{C_{i,k+1}/C_{i,k}}{\operatorname{Var}[ C_{i,k+1}/C_{i,k} | C_{i,k}] \sum_{j=0}^{I - k-1} \operatorname{Var}[ C_{j,k+1}/C_{j,k} | C_{j,k}] ^{-1}} \\ &= \sum_{i=0}^{I - k-1} \frac{C_{i,k+1}}{\sum_{j=0}^{I - k-1}C_{j,k}} \\ &= \frac{\sum_{i=0}^{I - k-1}C_{i,k+1}}{\sum_{i=0}^{I - k-1}C_{i,k}} \end{align*} [[/math]]

for [math]f_k[/math] with the required conditional minimum variance property. To show that the estimator [math]\hat{f_k}[/math] is unbiased, we can express it as a randomly weighted average of the random variables [math]C_{i,k+1}/C_{i,k}[/math]:

[[math]] \hat{f_k} = \sum_{i=0}^{I-k-1}W_{i,k} C_{i,k+1}/C_{i,k} [[/math]]

with [math]\sum_{i=0}^{I-k-1} W_{i,k} = 1[/math]. Taking expectations:

[[math]] \operatorname{E}[\hat{f_k}] = \sum_{i=0}^{I-k-1}\operatorname{E}[W_{i,k} \operatorname{E}[C_{i,k+1}/C_{i,k} |C_{i,k} ] ] = f_k \operatorname{E}[\sum_{i=0}^{I-k-1}W_{i,k}] = f_k. [[/math]]

■

Bühlmann-Straub Credibility Model

Recalling the Bornhuetter-Ferguson method, the expected ultimate claim [math]\mu_i[/math] for accident year [math]i[/math] can be unknown. In the Bühlmann-Straub Credibility model, we estimate [math]\mu_i[/math] using Bühlmann-Straub credibility estimates applied to the developmental triangle [math]C_{i,j}[/math]. To implement this method, we recall the Bühlmann-Straub credibility model assumptions when applied to the incremental claim triangle [math]X_{i,j} = C_{i,j+1}-C_{i,j}[/math]:

[math]\operatorname{E}[X_{i,j}\,|\,\Theta_i] = \gamma_j\mu(\Theta_i)[/math]
[math]\operatorname{Var}[X_{i,j}\,|\,\Theta_i] = \gamma_j\sigma^2(\Theta_i)[/math]
[math]\Theta_i[/math] are independent and identically distributed
Conditional on [math]\Theta_i[/math], [math]X_{i,j}[/math] is independent and identically distributed
[math]\sum_{j}\gamma_j = 1[/math] for all [math]i[/math]

The [math]\gamma_j[/math] can be interpreted as relative exposure levels that depend on the developmental year [math]j[/math]. In order to implement this method, we need to specify the relative exposure levels in advance or estimate them using, say, the [[guide:|the standard estimation techniques]].

Notice that because the [math]\Theta_i[/math] are assumed to be identically distributed, the unconditional expected ultimate claim amount is the same from accident year to accident year. The Bühlmann-Straub Credibility model generally allows for the exposure levels to depend on both indices [math]i,j[/math]. For stochastic reserving, we typically assume that the relative exposure levels [math]\gamma_j[/math] are the same from accident year to accident year, but the base exposure level can vary from accident year to accident year. A typical case is earned premium for accident year. If [math]P_i[/math] denotes the earned premium for accident year [math]i[/math] then we would simply scale the data [math]C_{i,j}/P_i[/math] to get claims per exposure unit. The estimate for the ultimate claim for accident year [math]i[/math] is then simply equal to [math]P_i[/math] multiplied by the estimate for the ultimate claim for accident year [math]i[/math] obtained with the scaled data.

Bühlmann-Straub Credibility Model

Estimate the developmental factors [math]f_j[/math] using the volume weighted method (Mack method)
Set [math]\hat{\gamma}_j = \hat{\beta}_j - \hat{\beta}_{j-1}[/math] for [math]j\gt0[/math] and [math]\hat{\gamma}_0 = \hat{\beta}_0[/math].
The estimate for the ultimate claims for accident year [math]i[/math] equals
[[math]]\hat{C}^{\textrm{BS2}}_{i,J} = \hat{\beta}_{I-i}\hat{C}_{i,J} + (1-\hat{\beta}_{I-i})\hat{C}^{\textrm{BS}}_{i,J}[[/math]]
where [math]\hat{C}_{i,J} [/math] is the estimate for ultimate claims for accident year [math]i[/math] based on the developmental factors [math]\hat{f}_j[/math] and [math]\hat{C}^{\textrm{BS}}_{i,J}[/math] is the Bühlmann-Straub Credibility estimate for [math]\mu_i[/math].
The Bühlmann-Straub Credibility estimate for [math]\mu_i[/math] equals
[[math]]\hat{C}^{\textrm{BS}}_{i,J} = Z_i\hat{C}_{i,J} + (1-Z_i)\hat{\mu}[[/math]]
where
[[math]]Z_i = \frac{\hat{\beta}_{I-i}}{\hat{\beta}_{I-i} + \hat{v}/\hat{a}}.[[/math]]
Here [math]\hat{v}[/math] is an estimate for [math]\operatorname{E}[\sigma^2(\Theta_i)][/math], and [math]\hat{a}[/math] is an estimate for [math]\operatorname{Var}[\mu(\Theta_i)][/math].
Compute the estimates [math]\hat{\mu}[/math], [math]\hat{a}[/math], and [math]\hat{v}[/math] using the formulas below:
[[math]]\begin{align*}m_i = \hat{\beta}_{I-i}, & \,\, s_i^2 = \frac{1}{I-i}\sum_{j=0}^{I-i}\hat{\gamma}_j \left(\frac{X_{i,j}}{\hat{\gamma}_j} - \hat{C}_{i,J}\right)^2 \\ m = \sum_i m_i, &\,\, \overline{C} = \frac{\sum_{i=0}^I C_{i,I-i}}{m} \\ \hat{v} = \frac{1}{I}\sum_{i=0}^{I-1} s_i^2, &\, \, \hat{a} = \frac{\sum_{i=0}^Im_i(\hat{C}_{i,J}-\overline{C})^2 - I\hat{v}}{m - \frac{1}{m}\sum_{i=0}^I m_i^2}.\end{align*}[[/math]]

Poisson Model

The Poisson model assumes that the incremental claim counts [math]Y_{i,j+1} = N_{i,j+1}-N_{i,j} [/math] have the following properties:

[math]Y_{i,j}[/math], [math] j = 0,\ldots, J[/math], are independent random variables for all [math]i,j[/math].
[math]Y_{i,j}[/math] is Poisson distributed with mean [math]\gamma_j \mu_i [/math] where [math]\sum_j \gamma_j = 1 [/math].

Under this model, it is a simple matter to check that the sequence [math]N_{i,j}[/math] satisfies the classical assumptions for the Bornhuetter-Ferguson method:

[math]\mu_i = \operatorname{E}[N_{i,J}] [/math] for all [math] 0 \leq i \leq I [/math]
[math]N_{i,J} - N_{i,j} [/math] is independent of [math]N_{i,j}[/math] for any [math]0 \leq j \lt J [/math]
[math]\gamma_j = \mu_i / \operatorname{E}[N_{i,j}] [/math]
[math]\operatorname{Var}[N_{i,j}F_j] = F_j \operatorname{Var}[N_{i,J}][/math] where [math]F_j = \sum_{k=1}^j \gamma_k [/math].

Following the Bornhuetter-Ferguson method, the projection for [math]N_{i,J}[/math] equals

[[math]] \hat{N}_{i,J} = N_{i,I + 1 - i} + (1 - F_i^{-1})\mu_i. [[/math]]

When the parameters [math]\mu_i[/math] and [math]\gamma_j[/math] are unknown, they can be estimated using maximum likelihood estimation.

When [math]\mu_i[/math] are known, the maximum likelihood estimators for [math]\gamma_j[/math] yield estimated development factors

[[math]]\hat{f}_j = \frac{\sum_{k=0}^{j+1}\hat{\gamma}_k}{\sum_{k=0}^j\hat{\gamma}_k}[[/math]]

that are equal to those obtained via the Mack-method.

Overdispersed Poisson Model (ODP)

Unlike the Poisson model, the overdispersed Poisson model doesn't assume anything about the probability distribution of the incremental claim count developmental triangle [math]Y_{i,j}[/math]. The overdispersed model is a special case of a generalized linear model:

[math]Y_{i,j}[/math], [math] j = 0,\ldots, J[/math], are independent random variables, for all [math] i, j [/math]
[math]\operatorname{E}[Y_{i,j}] = \gamma_je^{c+ \alpha_i + \beta_j}[/math], where [math] c\gt 0, \beta_1 = 0, \alpha_1 = 0[/math] and [math]\sum_{j}\gamma_j = 1[/math].
[math]\operatorname{Var}[Y_{i,j}] = \phi \operatorname{E}[Y_{i,j}][/math], where [math]\phi \gt 0[/math] is the dispersion parameter.

Generalized Linear Models

In a generalized linear model (GLM), each outcome [math]Y[/math] of the dependent variables is assumed to be generated from a particular distribution in an exponential family, a large class of probability distributions that includes the normal, binomial, Poisson and gamma distributions, among others. The mean, [math]\mu[/math], of the distribution depends on the independent variables, [math]\mathbf{X}[/math], through:

[[math]]\operatorname{E}[Y|\mathbf{X}] = \mu = g^{-1}(\mathbf{X}\boldsymbol{\beta}) [[/math]]

where [math]\operatorname{E}[Y|\mathbf{X}][/math] is the expected value of [math]Y[/math] conditional on [math]\mathbf{X}[/math]; [math]\mathbf{X}\boldsymbol{\beta}[/math] is the linear predictor, a linear combination of unknown parameters [math]\boldsymbol{\beta}[/math]; [math]g[/math] is the link function. In this framework, the variance is typically a function, [math]V[/math], of the mean:

[[math]] \operatorname{Var}[Y|\mathbf{X}] = \operatorname{V}(g^{-1}(\mathbf{X}\boldsymbol{\beta})). [[/math]]

The unknown parameters, [math]\boldsymbol{\beta}[/math], are typically estimated with maximum likelihood, maximum quasi-likelihood, or Bayesian techniques.

Model components

The GLM consists of three elements:

1. A particular distribution for modeling [math] Y [/math] from among those which are considered exponential families of probability distributions,

2. A linear predictor [math]\eta = \mathbf{X} \boldsymbol{\beta}[/math], and

3. A link function [math]g[/math] such that [math]\operatorname{E}[Y \mid \mathbf{X}] = \mu = g^{-1}(\eta)[/math].

Probability distribution

An overdispersed exponential family of distributions is a generalization of an exponential family and the exponential dispersion model of distributions and includes those families of probability distributions, parameterized by [math]\theta[/math] and [math]\phi[/math], whose density functions [math]f[/math]can be expressed in the form

[[math]] f_Y(y \mid \theta, \phi) = h(y,\phi) \exp \left(\frac{b(\theta)T(y) - A(\theta)}{d(\phi)} \right). \,\![[/math]]

The dispersion parameter, [math]\phi[/math], typically is known and is usually related to the variance of the distribution. The functions [math]h(y,\phi)[/math], [math]b(\theta)[/math], [math]T(y)[/math], [math]A(\theta)[/math], and [math]d(\phi)[/math] are known. Many common distributions are in this family, including the normal, exponential, gamma, Poisson, Bernoulli, and (for fixed number of trials) binomial, multinomial, and negative binomial.

If [math]b(\theta)[/math] is the identity function, then the distribution is said to be in canonical form (or natural form). Note that any distribution can be converted to canonical form by rewriting [math]\boldsymbol\theta[/math] as [math]\theta'[/math] and then applying the transformation [math]\theta = b(\theta')[/math]. It is always possible to convert [math]A(\theta)[/math] in terms of the new parametrization, even if [math]b(\theta')[/math] is not a one-to-one function; see comments in the page on exponential families. If, in addition, [math]T(y)[/math] is the identity and [math]\phi[/math] is known, then [math]\theta[/math] is called the canonical parameter (or natural parameter) and is related to the mean through

[[math]] \mu = A'(\theta). \,\![[/math]]

Under this scenario, the variance of the distribution can be shown to be^[1]

[[math]] A''(\theta) d(\phi). \,\![[/math]]

Linear predictor

The linear predictor is the quantity which incorporates the information about the independent variables into the model. The symbol η (Greek "eta") denotes a linear predictor. It is related to the expected value of the data through the link function.

[math]\eta[/math] is expressed as linear combinations (thus, "linear") of unknown parameters [math]\boldsymbol{\beta}[/math]. The coefficients of the linear combination are represented as the matrix of independent variables [math]\boldsymbol{X}[/math]. [math]\eta[/math] can thus be expressed as [math] \eta = \mathbf{X}\boldsymbol{\beta}.\,[/math]

Link function

The link function provides the relationship between the linear predictor and the mean of the distribution function. There are many commonly used link functions, and their choice is informed by several considerations. There is always a well-defined canonical link function which is derived from the exponential of the response's density function. However, in some cases it makes sense to try to match the domain of the link function to the range of the distribution function's mean, or use a non-canonical link function for algorithmic purposes, for example Bayesian probit regression.

When using a distribution function with a canonical parameter [math]\theta[/math], the canonical link function is the function that expresses [math]\theta[/math] in terms of [math]\mu[/math], i.e. [math]\theta = b(\mu)[/math]. For the most common distributions, the mean [math]\mu[/math] is one of the parameters in the standard form of the distribution's density function, and then [math]b(\mu)[/math] is the function as defined above that maps the density function into its canonical form. When using the canonical link function, [math]b(\mu) = \theta = \mathbf{X}\boldsymbol{\beta}[/math], which allows [math]\mathbf{X}^{\rm T} \mathbf{Y}[/math] to be a sufficient statistic for [math]\boldsymbol{\beta}[/math].

Following is a table of several exponential-family distributions in common use and the data they are typically used for, along with the canonical link functions and their inverses (sometimes referred to as the mean function, as done here).

Common distributions with typical uses and canonical link functions
Distribution	Support of distribution	Typical uses	Link name	Link function, [math]\mathbf{X}\boldsymbol{\beta}=g(\mu)\,\![/math]	Mean function
Normal	real: [math](-\infty,+\infty)[/math]	Linear-response data	Identity	[math]\mathbf{X}\boldsymbol{\beta}=\mu\,\![/math]	[math]\mu=\mathbf{X}\boldsymbol{\beta}\,\![/math]
Exponential	real: [math](0,+\infty)[/math]	Exponential-response data, scale parameters	Negative inverse	[math]\mathbf{X}\boldsymbol{\beta}=-\mu^{-1}\,\![/math]	[math]\mu=-(\mathbf{X}\boldsymbol{\beta})^{-1}\,\![/math]
Gamma	real: [math](0,+\infty)[/math]	Exponential-response data, scale parameters	Negative inverse	[math]\mathbf{X}\boldsymbol{\beta}=-\mu^{-1}\,\![/math]	[math]\mu=-(\mathbf{X}\boldsymbol{\beta})^{-1}\,\![/math]
Inverse Gaussian	real: [math](0, +\infty)[/math]		Inverse squared	[math]\mathbf{X}\boldsymbol{\beta}=\mu^{-2}\,\![/math]	[math]\mu=(\mathbf{X}\boldsymbol{\beta})^{-1/2}\,\![/math]
Poisson	integer: [math]0,1,2,\ldots[/math]	count of occurrences in fixed amount of time/space	Log	[math]\mathbf{X}\boldsymbol{\beta} = \ln(\mu) \,\![/math]	[math]\mu=\exp (\mathbf{X}\boldsymbol{\beta}) \,\![/math]

GLMs in Stochastic Reserving

The generalized linear models applicable to a triangle of random variables [math]Y_{i,j}[/math] are usually restricted to GLMs satisfying the following properties:

[math]\operatorname{E}[Y_{i,j}] = \mu_{i,j}[/math]
[math]\operatorname{Var}[Y_{i,j}] = \phi \mu_i^p[/math], where [math]p \geq 0[/math]
[math]\eta_{i,j} = g(\mu_{i,j}) = \mathbf{X}_{i,j}\boldsymbol{\beta}[/math]

The following table lists three families of distributions satisfying the properties above:

Distribution	Support of distribution	Link function, [math]\mathbf{X}\boldsymbol{\beta}=g(\mu)\,\![/math]	Mean function	[math]p[/math]	[math]\phi[/math]
Normal	real: [math](-\infty,+\infty)[/math]	[math]\mathbf{X}\boldsymbol{\beta}=\mu\,\![/math]	[math]\mu=\mathbf{X}\boldsymbol{\beta}\,\![/math]	0	[math]\sigma^2[/math]
Exponential	real: [math](0,+\infty)[/math]	[math]\mathbf{X}\boldsymbol{\beta}=-\mu^{-1}\,\![/math]	[math]\mu=-(\mathbf{X}\boldsymbol{\beta})^{-1}\,\![/math]	2	[math]\phi [/math]
Gamma	real: [math](0,+\infty)[/math]	[math]\mathbf{X}\boldsymbol{\beta}=-\mu^{-1}\,\![/math]	[math]\mu=-(\mathbf{X}\boldsymbol{\beta})^{-1}\,\![/math]	2	[math]\phi [/math]
Poisson	integer: [math]0,1,2,\ldots[/math]	[math]\mathbf{X}\boldsymbol{\beta} = \ln(\mu) \,\![/math]	[math]\mu=\exp (\mathbf{X}\boldsymbol{\beta}) \,\![/math]	1	1

The overdispersed Poisson model, introduced above, is a special case with log-link function

[[math]]\mathbf{X}_{i,j}\boldsymbol{\beta} = \log(\gamma_j) +c + \alpha_i + \beta_j = \log(\mu_{i,j})[[/math]]

, [math]p = 1 [/math], and a free dispersion parameter [math]\phi \gt 0 [/math]. To prevent overfitting, we typically set [math]\alpha _0 [/math] and [math]\beta_0 [/math] to zero.

Estimation of Model Parameters

Suppose we have a random sample [math]y_1,\ldots,y_n[/math] where [math]y_i[/math] is sampled from a distribution with density function

[[math]] f(y; \, \theta_i, \phi) = \exp(\frac{y\theta_i - b(\theta_i)}{(\phi/p_i)} + c(y,\phi)). [[/math]]

The log-likelihood function for the random sample equals

[[math]]\begin{equation}\label{glm-log-lik} l = \sum_{i=1}^n \frac{y_i\theta_i - b(\theta_i)}{(\phi/p_i)} + c(y_i,\phi).\end{equation}[[/math]]

If we assume a canonical link function of the form [math]\theta_i = g(\mu_i) [/math] where [math]g[/math] denotes the link function, then the log-likelihood function depends on the unknown parameters [math]\boldsymbol{\beta}[/math]. To estimate these unknown parameters, we use the maximum likelihood estimator. The maximum likelihood estimates can be found using an iteratively reweighted least squares algorithm or a Newton's method with updates of the form:

[[math]] \boldsymbol\beta^{(t+1)} = \boldsymbol\beta^{(t)} + \mathcal{J}^{-1}(\boldsymbol\beta^{(t)}) u(\boldsymbol\beta^{(t)}), [[/math]]

where [math]\mathcal{J}(\boldsymbol\beta^{(t)})[/math] is the observed information matrix (the negative of the Hessian matrix) and [math]u(\boldsymbol\beta^{(t)})[/math] is the score function; or a Fisher's scoring method:

[[math]] \boldsymbol\beta^{(t+1)} = \boldsymbol\beta^{(t)} + \mathcal{I}^{-1}(\boldsymbol\beta^{(t)}) u(\boldsymbol\beta^{(t)}), [[/math]]

where [math]\mathcal{I}(\boldsymbol\beta^{(t)})[/math] is the Fisher information matrix. When the canonical link function is in effect, the following algorithm is equivalent to the Fisher scoring algorithm given above:

MLE estimates for GLM: iterative least squares method

We iterate the following algorithm:

Create the sequence
[[math]]z_i = \hat{\eta}_i + (y_i - \hat{\mu}_i)\frac{d\eta_i}{d\mu_i}[[/math]]
where [math]\hat{\eta}_i = g(\hat{\mu}_i) [/math] and [math]\hat{\mu_i}[/math] is the current best estimate for [math]\mu_i.[/math]
Compute the weights
[[math]]w_i = \frac{p_i}{b''(\theta_i) \left(\frac{d\eta_i}{d\mu_i}\right)^2}.[[/math]]
Estimate [math]\boldsymbol{\beta}[/math] using weighted least-squares regression where [math]z_i[/math] is the dependent variable, [math]\mathbf{x}_i[/math] are the predictor variables, and [math]w_i[/math] are the weights:
[[math]]\hat{\boldsymbol{\beta}} = (X^T W X)^{-1}X^TWz, \, W_{ii} = w_i. [[/math]]

For the overdispersed Poisson model we do not assume anything about the distribution of the random sample; consequently, we can't obtain maximum likelihood estimates. Instead we obtain quasi-maximum likelihood estimates by maximizing the likelihood function \ref{glm-log-lik} associated with the log-link function [math]\eta_i = g(\mu_i) = \log(\mu_i)[/math]. In this special case, we have [math]b''(\theta_i) = \mu_i[/math] and [math]\frac{d\eta_i}{d\mu_i} = 1/\mu_i[/math].

QMLE estimates for overdispersed Poisson model

We iterate the following algorithm:

Create the sequence
[[math]]z_i = \log(\hat{\mu}_i) + \frac{y_i - \hat{\mu}_i}{\hat{\mu}_i}[[/math]]
where [math]\hat{\mu_i}[/math] is the current best estimate for [math]\mu_i.[/math]
Estimate [math]\boldsymbol{\beta}[/math] using weighted least-squares regression where [math]z_i[/math] is the dependent variable and [math]\mathbf{x}_i[/math] are the predictor variables:
[[math]]\hat{\boldsymbol{\beta}} = (X^T W X)^{-1}X^TWz, \, W_{ii} = \hat{\mu}_i. [[/math]]

References

McCullagh & Nelder 1989, Chapter 2.

Wikipedia References

Wikipedia contributors. "Generalized linear model". Wikipedia. Wikipedia. Retrieved 18 February 2023.

[1] McCullagh & Nelder 1989, Chapter 2.

[1]