Bühlmann Credibility

Bühlmann credibility theory offers linear estimation methods suitable for a class of random effects models. While not being as predictive as Bayesian credibility estimates, the Bühlmann credibility estimates have a simple formulaic representation that only depends on two fundamental parameters which can be easily estimated using available claims data.

Linear Approximation to MMSE

Suppose [math]X_1,\ldots, X_n[/math] represents data and we wish to use such data to estimate an unobservable random variable [math]Y[/math]. The Bayesian credibility estimate is usually difficult to compute without adding conditions on the random variables [math]X_i[/math] and [math]Y[/math]. Instead of putting additional constraints on the random variables, we consider the best linear approximation of the minimum mean square estimator (Bayesian credibility estimate):

[[math]] \begin{equation}\label{least-squares-linear-gen} \min \operatorname{E}\left[(Z -Y)^2 \right],\, Z = a_0 + \sum_{i}a_{i}X_{i}. \end{equation} [[/math]]

The estimator arising from solving \ref{least-squares-linear-gen} isn't as good as the minimum mean square estimator, but is usually far easier to compute than the latter -- there is a trade-off between predictive power and computability.

The Normal Equations

Notice that we haven't really specified any special attributes on the random variables other than requiring that they have finite variances (or simply finite second raw moments). The (unique) solution to \ref{least-squares-linear-gen}, denoted by [math]\hat{Y}[/math], is characterized by the following normal equations:

  1. [math]\operatorname{E}[Y] = \hat{a}_0 + \sum_{i}\hat{a}_i \operatorname{E}[X_i]=\operatorname{E}[\hat{Y}][/math] (unbiasedness condition)
  2. [math] \operatorname{E}[Y X_k]=\operatorname{E}[\hat{Y}X_k]=\sum_{i}\hat{a}_i \operatorname{E}[X_i X_k] \,\, \text{for all} \,\, k[/math].

When the random variables in question all have the same expectation, the normal equations reduce to

  1. [math]\operatorname{E}[Y] = \hat{a}_0 + \sum_{i}\hat{a}_i \operatorname{E}[Y][/math] (unbiasedness condition)
  2. [math] \operatorname{Cov}(Y,X_k)=\operatorname{Cov}(\hat{Y},X_k)=\sum_{i}\hat{a}_i \operatorname{Cov}(X_i,X_k) \,\, \text{for all} \,\, k[/math].

The Bühlmann Model

The Bühlmann model generates data as follows:

  • [math]\Theta_i,\, i=1,\dots,I[/math] are mutually independent, identically distributed random variables representing risk classes.
  • The data is represented by the random variables [math]X_{ij},\,(i=1,\dots,I \,\,, j=1,\dots,n)[/math] and [math](X_i,\Theta_i)\, (i=1,\dots,I)[/math] are mutually independent, identically distributed random variables. Furthermore, the random variables [math]X_{i1},\dots,X_{in}[/math] are mutually independent, identically distributed random variables conditional on knowing the value of [math]\Theta_i[/math]. In other words, the data is generated in a two-step process: generate [math]I[/math] risk classes then generate for each risk class an i.i.d sequence of [math]n[/math] random variables with a common distribution that depends on the corresponding risk class.
  • The conditional mean (expectation) of [math]X_{ij}[/math] given [math]\Theta_i[/math] is denoted by [math]\mu(\Theta_i)[/math] and the unconditional mean (the expectation of the conditional mean) is given by [math]\mu[/math].
  • The random variables [math]X_{ij}[/math] all have finite variances,

[[math]] \begin{equation} \label{vhm} \sigma^2 = \operatorname{E}[\sigma^2(\Theta_i)],\,\, \sigma^2(\Theta_i) = \operatorname{Var}(X_{ij}\,|\,\Theta_i) \end{equation} [[/math]]

denotes the expected process variance (EPV) and

[[math]] \begin{equation} \label{epv} \rho^2 = \operatorname{Var}[\mu(\Theta_i)] \end{equation} [[/math]]

denotes the variance of the hypothetical mean (VHM).

Credibility Estimator

We are interested in estimating [math]\mu(\Theta_i)[/math] for each [math]i[/math] via credibility estimators (see Credibility Estimators and set [math]Y=\mu(\Theta_i)[/math] ). We start with the simple Bühlmann model and set [math]I = 1[/math]. Since the optimization problem \ref{least-squares-linear-gen} doesn't depend on the ordering of the data [math]X_i[/math], we must add the additional constraint [math]a_i=a_j[/math] for [math]i,j \gt0[/math]. By equation 1 of the normal equations for this model, we must have

[[math]] \hat{\mu}(\Theta) = \mu (1- \alpha) + \frac{\alpha}{n} \sum_{i}X_i = \mu (1- \alpha) + \alpha \overline{X} [[/math]]

for an [math]0 \leq \alpha \leq 1[/math]. By equation 2 of the normal equations, we must also have

[[math]] \begin{equation} \label{simple-normaleqn} \frac{\alpha}{n}\sum_{i}\operatorname{Cov}(X_i,X_k) = \operatorname{Cov}(\mu(\Theta),X_k) \quad \text{for all} \,\, k \end{equation} [[/math]]

with

[[math]] \begin{eqnarray} \label{simple-coveqn-1}\operatorname{Cov}(X_i,X_k) = \begin{cases} \operatorname{E}[\operatorname{Var}(X_k|\Theta)] + \operatorname{Var}[\mu(\Theta)] & i=k \\ \operatorname{Var}[\mu(\Theta)] & i\neq k \end{cases} \\ \label{simple-coveqn-2}\operatorname{Cov}(\mu(\Theta),X_k) = \operatorname{Cov}(\mu(\Theta),\operatorname{E}[X_k|\Theta]) = \operatorname{Var}[\mu(\Theta)]. \end{eqnarray} [[/math]]

Using \ref{simple-coveqn-1} and \ref{simple-coveqn-2} in \ref{simple-normaleqn} and then solving for [math]\alpha[/math] yields the following credibility estimator:

[[math]] \begin{equation} \label{} \hat{\mu}(\Theta) = (1- \alpha) \mu + \alpha \overline{X},\quad \alpha = \frac{n}{n + \kappa},\quad \kappa = \frac{\sigma^2}{\rho^2}. \end{equation} [[/math]]

The approach to deriving the credibility estimator for the simple Bühlmann model can be imitated to derive the credibility estimator for the general Bühlmann model (arbitrary number of risk classes). The credibility estimator equals

[[math]] \hat{\mu}(\Theta_i) = (1- \alpha) \mu + \alpha \overline{X}_i,\quad \alpha = \frac{n}{n + \kappa},\quad \kappa = \frac{\sigma^2}{\rho^2} [[/math]]

with [math]\overline{X}_i[/math] denoting the average value for the data generated from risk class [math]i[/math]:

[[math]] \overline{X}_i = \frac{1}{n} \sum_{j=1}^n X_{ij}. [[/math]]

Since data from distinct risk classes are mutually independent, It shouldn't be surprising that [math]\hat{\mu}(\Theta_i)[/math] only depends on the data corresponding to risk class [math]\Theta_i[/math]

Bühlmann-Straub Model

The Bühlmann-Straub Model is similar to the Bühlmann Model except that we introduce time varying exposure levels and the number of observations can vary from one risk class to another. More precisely, think of the index [math]j[/math] in the Bühlmann model as a time index and let [math]v_{ij}[/math] denote the exposure level (volume measure) associated with the average loss per exposure unit [math]X_{ij}[/math]. The total loss associated with risk [math]i[/math] at time [math]j[/math] is thus [math]v_{ij}[/math]. You can think of the Bühlmann model as being a special case of the Bühlmann-Straub model by setting the exposure levels to 1 at all times and for all risk classes, i.e., [math]v_{ij} = 1[/math] for all [math]i[/math] and [math]j[/math]. We also impose the following two conditional moment conditions:

  1. [math]\operatorname{E}[X_{ij}\,|\,\Theta_i] = \mu(\Theta_i)[/math] (conditional mean is exposure invariant)
  2. [math]\operatorname{Var}(X_{ij}\,|\,\Theta_i) = \sigma^2(\Theta_i)/v_{ij}[/math] (conditional variance scales inversely with exposure).


Note that [math]\mu(\Theta_i)[/math] and [math]\sigma^2(\Theta_i)[/math] is defined as the conditional mean and conditional variance respectively when the exposure level is 1.

Credibility Estimator

As with the Bühlmann model, we can derive the credibility estimator for the Bühlmann-Straub model by solving the normal equations. The credibility estimator is

[[math]] \begin{equation} \label{bs-cred-estimator} \hat{\mu}(\Theta_i) = (1- \alpha_i) \mu + \alpha_i \overline{X}_i,\quad \alpha_i = \frac{v_i}{v_i + \kappa},\quad \kappa = \frac{\sigma^2}{\rho^2} \end{equation} [[/math]]

with [math]\overline{X}_i[/math] denoting the exposure weighted average of the data (losses) generated from risk class [math]\Theta_i[/math]:

[[math]] \overline{X}_i = v_i^{-1} \sum_{j=1}^{n_i} v_{ij} X_{ij},\,\, v_i = \sum_{j=1}^{n_i} v_{ij}. [[/math]]

Application to Claim Frequencies

We consider the special case when the data represent claim frequencies. More precisely, we assume the same setup as with the Bühlmann-Straub model but we replace the notation a little bit by replacing [math]X_{ij}[/math] with [math]F_{ij}[/math] to denote the fact that we're modelling average claim frequency per exposure unit. We also have the following additional assumption for the model:

  • The total number of claims, [math]N_{ij} = v_{ij}F_{ij}[/math], for risk class [math]\Theta_i[/math] at time [math]j[/math] is conditionally (conditional on [math]\Theta_i[/math]) poisson distributed with mean [math]v_{ij}\Theta_i[/math].


Straightforward calculations show that the moment conditions imposed by the Bühlmann-Straub model are not violated:
[[math]] \operatorname{E}[F_{ij}\,|\,\Theta_i] = \Theta_i,\quad\operatorname{Var}(F_{ij}\,|\,\Theta_i) = \Theta_i/v_{ij}. [[/math]]


Since this model is a special case of the Bühlmann-Straub model, the credibility estimator is (see \ref{bs-cred-estimator})

[[math]] \begin{equation} \label{bs-cred-estimator-freq} \hat{\mu}(\Theta_i) = (1- \alpha_i) \mu + \alpha_i \overline{F}_i \end{equation} [[/math]]

with

[[math]] \begin{align} \label{claim-freq-kappa} \kappa &= \sigma^2/\rho^2 = \operatorname{E}[\Theta_i]/\operatorname{Var}(\Theta_i) = \mu/\operatorname{Var}(\Theta_i) \\ \overline{F}_i &= v_i^{-1} \sum_{j=1}^n N_{ij}. \end{align} [[/math]]

Poisson Gamma Model

If [math]\Theta_i[/math] is Gamma distributed with shape parameter [math]\alpha[/math] and scale parameter [math]\beta[/math], then

[[math]] \mu = \alpha \beta,\, \kappa = \beta^{-1}, \, \alpha_i = \frac{v_i}{v_i + \beta^{-1}}. [[/math]]

The Bayesian credibility estimator for the Poisson-Gamma Model equals:

[[math]] \frac{\alpha + v_i\overline{F}_i}{v_i + \beta^{-1}} = \mu (1 - \alpha_i) + \alpha_i \overline{F}_i. [[/math]]

For the Poisson-Gamma model for claim frequencies, the Bühlmann credibility estimator equals the Bayesian credibility estimator. It should be noted that we didn't need to perform any algebraic manipulations to show the equality of the estimators: the Bühlmann credibility estimator is the best linear approximation to the Bayesian credibility estimator, so if the Bayesian credibility estimator is already linear then it must equal the Bühlmann credibility estimator.

If the Bayesian credibility estimator is linear, then it must equal the Bühlmann credibility estimator.

Estimating Parameters of Interest

The credibility estimators presented so far depend on the parameters [math]\kappa[/math] and [math]\mu[/math] which could be unknown or difficult to compute; consequently, it would be useful to estimate these parameters based on the available data. In what follows, we assume the Bühlmann-Straub model holds for the data/observations.

Estimating μ

Since [math]\mu[/math] is the unconditional mean and the expectation of each observation equals [math]\mu[/math], then a suitable unbiased estimator for [math]\mu[/math] is

[[math]] \begin{equation} \hat{\mu} =\sum_{i=1}^I \frac{v_i}{v}\overline{X}_i= \sum_{i=1}^{I}\sum_{j=1}^{n_i}\frac{v_{ij}}{v} X_{ij} \,, \,\, v = \sum_{i=1}^I v_i = \sum_{i=1}^I\sum_{j=1}^{n_i}v_{ij}\,. \end{equation} [[/math]]

Even though the estimator above is unbiased, the recommended estimator for the unconditional mean is

[[math]] \hat \mu = \frac{\sum_{i=1}^I \overline{X}_i Z_i}{\sum_{i=1}^I Z_i} [[/math]]

where [math]Z_i[/math] are the credibility weights. The estimator above is the best estimator in the following sense:

[[math]] \hat \mu = \underset{Y \in \mathcal{F}}{\operatorname{argmin}} \operatorname{E}[\left(Y - \mu \right)^2],\, \mathcal{F} = \{Y = \sum_{i,j}a_{i,j}X_{i,j} \, | \, \sum_{i,j} a_{i,j} = 1\}. [[/math]]

In other words, the estimator minimizes the mean square error among all convex combinations of the data [math]X_{i,j}[/math]. Since we don't know the parameters [math]\sigma^2 [/math] and [math]\rho^2[/math], we replace [math]Z_i[/math] with [math]\hat Z_i [/math]:

[[math]] \begin{equation} \label{uncond-mean-est}\hat \mu = \frac{\sum_{i=1}^I \overline{X}_i \hat Z_i}{\sum_{i=1}^I \hat Z_i}. \end{equation} [[/math]]

Estimating σ2

Recall that [math]\sigma^2[/math] is the expectation of the conditional variance; consequently, if we can estimate [math]\sigma^2(\Theta_i)[/math] for each [math]i[/math] then we average out all these estimates to get an estimate for [math]\sigma^2[/math]. An unbiased estimator for [math]\sigma^2(\Theta_i)[/math] is

[[math]] \hat{\sigma}^2(\Theta_i) = \frac{1}{n_i -1}\sum_{j=1}^{n_i}v_{ij}(X_{ij} - \overline{X}_i)^2 [[/math]]

and thus an unbiased estimator for [math]\sigma^2[/math] is given by

[[math]] \hat{\sigma}^2 = \frac{1}{N - I}\sum_{i=1}^I\sum_{j=1}^{n_i}v_{ij}(X_{ij} - \overline{X}_i)^2 \, , \,\, N = \sum_{i=1}^In_i \,\,. [[/math]]

Estimating ρ2

Recall that [math]\rho^2[/math] is the variance of the conditional means or simply [math]\operatorname{Var}[\mu(\Theta_i)][/math]. The natural approach would be to estimate [math]\mu(\Theta_i)[/math] for each [math]i[/math] and then calculate a kind of weighted variance of these estimates to get an estimate for [math]\rho^2[/math]. Following estimating μ, we use [math]\overline{X}_i[/math] to estimate [math]\mu(\Theta_i)[/math] and use [math]\overline{X}[/math] to estimate [math]\mu[/math] (or the expectation of each [math]\overline{X}_i[/math]); consequently, if the [math]v_i[/math] are equal across risk classes then the following seems to be a natural choice as an estimator for [math]\rho^2[/math]:

[[math]] \hat{\rho}_1^2 = \frac{1}{I - 1} \sum_{i=1}^{I}\left(\overline{X}_i - \overline{X} \right)^2. [[/math]]

Unfortunately a straightforward calculation shows that [math]\hat{\rho}_1[/math] is a biased estimator for [math]\rho^2[/math]:

[[math]] \begin{equation} \label{rho-est-exp} \operatorname{E}[\hat{\rho}^2_1] = \operatorname{Var}[X_i] = \frac{\sigma^2}{v_1} + \rho^2. \end{equation} [[/math]]

Since [math]\hat{\sigma}^2[/math] is an unbiased estimator for [math]\sigma^2[/math], equation \ref{rho-est-exp} shows that

[[math]] \hat{\rho}^2 = \hat{\rho}_1^2 - \frac{\hat{\sigma}^2}{v_1} [[/math]]

is an unbiased estimator for [math]\rho^2[/math]. The general case is a little more complicated and requires some delicate calculations. We have the following proposition:

Proposition (Unbiasedness of [math]\hat{\rho}^2) [/math]

[[math]] \hat{\rho}_2^2 = c \left (\hat{\rho}_1^2 - \frac{I \hat{\sigma}^2}{v} \right)\,, \, \hat{\rho}_1^2 = \frac{I}{I-1} \sum_{i=1}^{I}\frac{v_i}{v}\left(\overline{X}_i - \overline{X} \right)^2 [[/math]]
with
[[math]] c = \frac{I-1}{I} \left[ \sum_{i=1}^I \frac{v_i}{v} \left(1 - \frac{v_i}{v} \right) \right]^{-1} [[/math]]
is an unbiased estimator for [math]\rho^2[/math].

Show Proof

We have

[[math]] \begin{align*} \operatorname{E}\left [ \sum_{i=1}^Iv_i (\,\overline{X}_i - \overline{X} \,)^2 \right] &= \operatorname{E}\left [\sum_{i=1}^I v_i \overline{X}_i^2 -2\overline{X}\sum_{i=1}^Iv_i \overline{X}_i + v \overline{X}^2 \right] \\ &= \operatorname{E}\left [\sum_{i=1}^I v_i \overline{X}_i^2 -v^{-1}\sum_{i=1}^I(v_i \overline{X}_i)^2 \right] \\ &= \sum_{i=1}^I v_i(1 - \frac{v_i}{v})\operatorname{E}[\,\overline{X}_i^2\,] \\ &= \sum_{i=1}^I v_i(1 - \frac{v_i}{v})(\frac{\sigma^2}{v_i} + \rho^2) \\ &= (I-1)\sigma^2+\rho^2 \sum_{i=1}^I v_i(1 - \frac{v_i}{v}) \end{align*} [[/math]]

and thus if

[[math]] \hat \rho^2_1 = \frac{I}{I-1} \sum_{i=1}^{I}\frac{v_i}{v}\left(\overline{X}_i - \overline{X} \right)^2 [[/math]]

then

[[math]] \operatorname{E}[\hat \rho^2_1] = \frac{I\sigma^2}{v}+\rho^2 c^{-1}\,,\, c = \frac{I-1}{I} \left[ \sum_{i=1}^I \frac{v_i}{v} \left(1 - \frac{v_i}{v} \right) \right]^{-1}. [[/math]]

Hence

[[math]] c \left[\hat \rho^2_1 - \frac{I\hat{\sigma}^2}{v}\right] [[/math]]

is an unbiased estimator for [math]\rho^2[/math]. ■

Since an estimator for [math]\rho^2[/math] should always be nonnegative, the estimator is defined as

[[math]] \hat{\rho}^2 = \max(0,\hat{\rho}_2^2). [[/math]]

If [math]\rho^2[/math] is estimated to be 0, then the credibility estimator is estimated to be [math]\mu[/math] (or [math]\hat{\mu}[/math] when the unconditional mean isn't known).

Estimating κ

An estimate for [math]\kappa[/math] can be obtained by dividing the estimate for [math]\sigma^2[/math] by the estimate for [math]\rho^2[/math]:

[[math]] \hat{\kappa} = \hat{\sigma}^2/\hat{\rho}^2\,\,. [[/math]]

Semiparametric Estimation

As we have seen (recall Application to Claim Frequencies), it is sometimes the case that the conditional distribution of [math]X_{ij}[/math] given [math]\Theta_i[/math] has an explicit (parametric) representation and this can yield a simpler representation for [math]\kappa[/math]. Consider the standard example covered in Application to Claim Frequencies where the claim frequencies are Poisson distributed with mean [math]\Theta_i[/math]. Equation \ref{claim-freq-kappa} shows that a suitable estimator for [math]\kappa[/math] is given by

[[math]] \hat{\kappa} = \hat{\mu}/\hat{\rho}^2\,\,. [[/math]]

and thus we don't have to estimate [math]\sigma^2[/math]. The estimator [math]\hat \mu [/math] is set to [math]\overline{X}[/math] since we can't use the usual estimator (\ref{uncond-mean-est}) (the estimator needs [math]\hat k [/math]).

Wikipedia References

  • Bühlmann, Hans (1967). "Experience rating and credibility" 4 (3): 99–207. ASTIN Bulletin. 
  • Wikipedia contributors. "Bühlmann model". Wikipedia. Wikipedia. Retrieved 23 October 2020.