⧼exchistory⧽
11 exercise(s) shown, 0 hidden
Jun 25'23

Find the lasso regression solution for the data below for a general value of $\lambda$ and for the straight line model $Y = \beta_0 + \beta_1 X + \varepsilon$ (only apply the lasso penalty to the slope parameter, not to the intercept). Show that when $\lambda_1$ is chosen as 14, the lasso solution fit is $\hat{Y} = 40 + 1.75 X$. Data: $\mathbf{X}^{\top} = (X_1, X_2, \ldots, X_{8})^{\top} = (-2, -1, -1, -1, 0, 1, 2, 2)^{\top}$, and $\mathbf{Y}^{\top} = (Y_1, Y_2, \ldots, Y_{8})^{\top} = (35, 40, 36, 38, 40, 43, 45, 43)^{\top}$.

Jun 25'23

Consider the standard linear regression model $Y_i = \mathbf{X}_{i,\ast} \bbeta + \varepsilon_i$ for $i=1, \ldots, n$ and with $\varepsilon_i \sim_{i.i.d.} \mathcal{N}(0, \sigma^2)$. The model comprises a single covariate and, depending on the subquestion, an intercept. Data on the response and the covariate are: $\{(y_i, x_{i,1})\}_{i=1}^4 = \{ (1.4, 0.0), (1.4, -2.0), (0.8, 0.0), (0.4, 2.0) \}$.

• Evaluate the lasso regression estimator of the model without intercept for the data at hand with $\lambda_1 = 0.2$.
• Evaluate the lasso regression estimator of the model with intercept for the data at hand with $\lambda_1 = 0.2$ that does not apply to the intercept (which is left unpenalized).
Jun 25'23

Plot the regularization path of the lasso regression estimator over the range $\lambda_1 \in (0, 160]$ using the data of Example.

Jun 25'23

Consider the standard linear regression model $Y_i = X_{i,1} \beta_1 + X_{i,2} \beta_2 + \varepsilon_i$ for $i=1, \ldots, n$ and with the $\varepsilon_i$ i.i.d. normally distributed with zero mean and some known common variance. In the estimation of the regression parameter $(\beta_1, \beta_2)^{\top}$ a lasso penalty is used: $\lambda_{1,1} | \beta_1 | + \lambda_{1,2} | \beta_2 |$ with penalty parameters $\lambda_{1,1}, \lambda_{1,2} \gt 0$.

• Let $\lambda_{1,1} = \lambda_{1,2}$ and assume the covariates are orthogonal with the spread of the first covariate being much larger than that of the second. Draw a plot with $\beta_1$ and $\beta_2$ on the $x$- and $y$-axis, repectively. Sketch the parameter constraint as implied by the lasso penalty. Add the levels sets of the sum-of-squares, $\| \mathbf{Y} - \mathbf{X} \bbeta \|_2^2$, loss criterion. Use the plot to explain why the lasso tends to select covariates with larger spread.
• Assume the covariates to be orthonormal. Let $\lambda_{1,2} \gg \lambda_{1,1}$. Redraw the plot of part a of this exercise. Use the plot to explain the effect of differening $\lambda_{1,1}$ and $\lambda_{1,2}$ on the resulting lasso estimate.
• Show that the two cases (i.e. the assumptions on the covariates and penalty parameters) of part a and b of this exercise are equivalent, in the sense that their loss functions can be rewritten in terms of the other.
Jun 25'23

Investigate the effect of the variance of the covariates on variable selection by the lasso. Hereto consider the toy model: $Y_i = X_{1i} + X_{2i} + \varepsilon_i$, where $\epsilon_i \sim \mathcal{N}(0, 1)$, $X_{1i} \sim \mathcal{N}(0, 1)$, and $X_{2i} = a \, X_{1i}$ with $a \in [0, 2]$. Draw a hundred samples for both $X_{1i}$ and $\varepsilon_i$ and construct both $X_{2i}$ and $Y_i$ for a grid of $a$'s. Fit the model by means of the lasso regression estimator with $\lambda_1=1$ for each choice of $a$. Plot e.g. in one figure a) the variance of $X_{i1}$, b) the variance of $X_{2i}$, and c) the indicator of the selection of $X_{2i}$. Which covariate is selected for which values of scale parameter $a$?

Jun 25'23

Show the non-uniqueness of the lasso regression estimator for $p \gt 2$ when the design matrix $\mathbf{X}$ contains linearly dependent columns.

Jun 25'23

Consider the linear regression model $\mathbf{Y} = \mathbf{X} \bbeta + \vvarepsilon$ with $\vvarepsilon \sim \mathcal{N}(0,\sigma^2)$ and an $n \times 2$-dimensional design matrix with zero-centered and standardized but collinear columns, i.e.:

[$] \begin{eqnarray*} \mathbf{X}^{\top} \mathbf{X} & = & \left( \begin{array}{ll} 1 & \rho \\ \rho & 1 \end{array} \right) \end{eqnarray*} [$]

with $\rho \in (-1, 1)$. Then, an analytic expression for the lasso regression estimator exists. Show that:

[$] \begin{eqnarray*} \hat{\beta}_j (\lambda_1) & = & \left\{ \begin{array}{ll} \mbox{sgn}(\hat{\beta}_j) [| \hat{\beta}_j | - \tfrac{1}{2} \lambda_1 (1+\rho)^{-1}]_+ & \mbox{ if } \, \mbox{sgn}[\hat{\beta}_1 (\lambda_1)] = \mbox{sgn}[\hat{\beta}_2 (\lambda_1)], \\ & \hat{\beta}_j (\lambda_1) \not= 0 \not= \hat{\beta}_2 (\lambda_1), \\ \mbox{sgn}(\hat{\beta}_j) [| \hat{\beta}_j | - \tfrac{1}{2} \lambda_1 (1-\rho)^{-1}]_+ & \mbox{ if } \, \mbox{sgn}[\hat{\beta}_1 (\lambda_1)] \not= \mbox{sgn}[\hat{\beta}_2 (\lambda_1)], \\ & \hat{\beta}_1 (\lambda_1) \not= 0 \not= \hat{\beta}_2 (\lambda_1), \\ \left\{ \begin{array}{lcl} 0 & \mbox{ if } & j \not= \arg \max_{j'} \{ | \hat{\beta}_{j'}^{\mbox{{\tiny (ols)}}} | \} \\ \mbox{sgn}(\tilde{\beta}_j) ( | \tilde{\beta}_j | - \tfrac{1}{2} \lambda_1)_+ & \mbox{ if } & j = \arg \max_{j'} \{ | \hat{\beta}_{j'}^{\mbox{{\tiny (ols)}}} | \} \end{array} \right. & \mbox{ otherwise, } \end{array} \right. \end{eqnarray*} [$]

where $\tilde{\beta}_j = (\mathbf{X}_{\ast,j}^{\top} \mathbf{X}_{\ast,j})^{-1} \mathbf{X}_{\ast,j}^{\top} \mathbf{Y}$.

Jun 25'23

Consider the standard linear regression model $Y_i = \mathbf{X}_{i,\ast} \bbeta + \varepsilon_i$ for $i=1, \ldots, n$ and with the $\varepsilon_i$ i.i.d. normally distributed with zero mean and a common variance. Moreover, $\mathbf{X}_{\ast,j} = \mathbf{X}_{\ast,j'}$ for all $j, j'=1, \ldots, p$ and $\sum_{i=1}^n X_{i,j}^2 = 1$. Question revealed that in this case all elements of the ridge regression estimator are equal, irrespective of the choice of the penalty parameter $\lambda_2$. Does this hold for the lasso regression estimator? Motivate your answer.

Jun 25'23

Consider the linear regression model $Y_i = \mathbf{X}_{i,\ast} \bbeta + \varepsilon_i$ for $i=1, \ldots, n$ and with the $\varepsilon_i$ i.i.d. normally distributed with zero mean and a common variance. Relevant information on the response and design matrix are summarized as:

[$] \begin{eqnarray*} \mathbf{X}^{\top} \mathbf{X} = \left( \begin{array}{rr} 3 & -2 \\ -2 & 2 \end{array} \right), \qquad \mathbf{X}^{\top} \mathbf{Y} = \left( \begin{array}{r} 3 \\ -1 \end{array} \right). \end{eqnarray*} [$]

The lasso regression estimator is used to learn parameter $\bbeta$.

• Show that the lasso regression estimator is given by:
[$] \begin{eqnarray*} \hat{\bbeta}(\lambda_1) & = & \arg \min_{\bbeta \in \mathbb{R}^2} 3 \beta_1^2 + 2 \beta_2^2 - 4 \beta_1 \beta_2 - 6 \beta_1 + 2 \beta_2 + \lambda_1 | \beta_1 | + \lambda_1 | \beta_2|. \end{eqnarray*} [$]
• For $\lambda_{1} = 0.2$ the lasso estimate of the second element of $\bbeta$ is $\hat{\beta}_2(\lambda_1) = 1.25$. Determine the corresponding value of $\hat{\beta}_1(\lambda_1)$.
• Determine the smallest $\lambda_1$ for which it is guaranteed that $\hat{\bbeta}(\lambda_1) = \mathbf{0}_2$.
Show $\| \hat{\bbeta}(\lambda_1)\|_1$ is monotone decreasing in $\lambda_1$. In this assume orthonormality of the design matrix $\mathbf{X}$.