Consider the Bayesian linear regression model $\mathbf{Y} = \mathbf{X} \bbeta + \vvarepsilon$ with $\vvarepsilon \sim \mathcal{N}(\mathbf{0}_n, \sigma^2 \mathbf{I}_{nn})$, a multivariate normal law as conditional prior distribution on the regression parameter: $\bbeta \, | \, \sigma^2 \sim \mathcal{N}(\bbeta_0, \sigma^2 \mathbf{\Delta}^{-1})$, and an inverse gamma prior on the error variance $\sigma^2 \sim \mathcal{IG}(\gamma, \delta)$. The consequences of various choices for the hyper parameters of the prior distribution on $\bbeta$ are studied.
• Consider the following conditional prior distributions on the regression parameters $\bbeta \, | \, \sigma^2 \sim \mathcal{N}(\bbeta_0, \sigma^2 \mathbf{\Delta}_a^{-1})$ and $\bbeta \, | \, \sigma^2 \sim \mathcal{N}(\bbeta_0, \sigma^2 \mathbf{\Delta}_b^{-1})$ with precision matrices $\mathbf{\Delta}_a, \mathbf{\Delta}_b \in \mathcal{S}_{++}^p$ such that $\mathbf{\Delta}_a \succeq \mathbf{\Delta}_b$, i.e. $\mathbf{\Delta}_a = \mathbf{\Delta}_b + \mathbf{D}$ for some positive semi-definite symmetric matrix of appropriate dimensions. Verify:
[$] \begin{eqnarray*} \mbox{Var}(\bbeta \, | \, \sigma^2, \mathbf{Y}, \mathbf{X}, \bbeta_0, \mathbf{\Delta}_a) & \preceq & \mbox{Var}(\bbeta \, | \, \sigma^2, \mathbf{Y}, \mathbf{X}, \bbeta_0, \mathbf{\Delta}_b), \end{eqnarray*} [$]
• In the remainder of this exercise assume $\mathbf{\Delta}_a =\mathbf{\Delta} = \mathbf{\Delta}_b$. Let $\bbeta_t$ be the ‘true’ or ‘ideal’ value of the regression parameter, that has been used in the generation of the data, and show that a better initial guess yields a better posterior probability at $\bbeta_t$. That is, take two prior mean parameters $\bbeta_0 = \bbeta_0^{\mbox{{\tiny (a)}}}$ and $\bbeta_0 = \bbeta_0^{\mbox{{\tiny (b)}}}$ such that the former is closer to $\bbeta_t$ than the latter. Here close is defined in terms of the Mahalabonis distance, which for, e.g. $\bbeta_t$ and $\bbeta_0^{\mbox{{\tiny (a)}}}$ is defined as $d_M(\bbeta_t, \bbeta_0^{\mbox{{\tiny (a)}}}; \mathbf{\Sigma}) = [(\bbeta_t - \bbeta_0^{\mbox{{\tiny (a)}}})^{\top} \mathbf{\Sigma}^{-1} (\bbeta_t - \bbeta_0^{\mbox{{\tiny (a)}}})]^{1/2}$ with positive definite covariance matrix $\mathbf{\Sigma}$ with $\mathbf{\Sigma} = \sigma^2 \mathbf{\Delta}^{-1}$. Show that the posterior density $\pi_{\bbeta \, | \, \sigma^2} (\bbeta \, | \, \sigma^2, \mathbf{X}, \mathbf{Y}, \bbeta_0^{\mbox{{\tiny (a)}}}, \mathbf{\Delta})$ is larger at $\bbeta =\bbeta_t$ than with the other prior mean paramater.
[$] \begin{eqnarray*} d_M[\bbeta_t, \mathbb{E}(\bbeta \, | \, \sigma^2, \mathbf{Y}, \mathbf{X}, \bbeta_0^{\mbox{{\tiny (a)}}}, \mathbf{\Delta}); \mathbf{\Sigma}] & \leq & d_M[\bbeta_t, \mathbb{E}(\bbeta \, | \, \sigma^2, \mathbf{Y}, \mathbf{X}, \bbeta_0^{\mbox{{\tiny (b)}}}, \mathbf{\Delta}); \mathbf{\Sigma}], \end{eqnarray*} [$]
now with $\mathbf{\Sigma} = \sigma^2 (\mathbf{X}^{\top} \mathbf{X} + \mathbf{\Delta})^{-1}$.