⧼exchistory⧽
2 exercise(s) shown, 0 hidden
Jun 12'23

Consider the linear hypothesis space consisting of linear maps parameterized by weights $\weights$. We try to find the best linear map by minimizing the regularized average squared error loss (empirical risk) incurred on a training set

[$]\dataset \defeq \big \{ (\featurevec^{(1)},\truelabel^{(1)}),(\featurevec^{(2)},\truelabel^{(2)}),\ldots,(\featurevec^{(\samplesize)},\truelabel^{(\samplesize)}) \big \}.[$]

Ridge regression augments the average squared error loss on $\dataset$ by the regularizer $\| \weights \|^{2}$, yielding the following learning problem

[$] \min_{\weights \in \mathbb{R}^{\featurelen}} f(\weights) = (1/\samplesize)\sum_{\sampleidx=1}^{\samplesize}\big( \truelabel^{(\sampleidx)} - \weights^{T} \featurevec^{(\sampleidx)} \big) + \regparam \| \weights \|^{2}_{2}.[$]

Is it possible to rewrite the objective function $f(\weights)$ as a convex quadratic function $f(\weights) = \weights^{T} \mathbf{C} \weights + \vb \weights + c$?

If this is possible, how are the matrix $\mathbf{C}$, vector $\vb$ and constant $c$ related to the feature vectors and labels of the training data ?

Consider data points, each characterized by $\featurelen=10$ features $\featurevec \in \mathbb{R}^{\featurelen}$ and a single numeric label $\truelabel$. We want to learn a linear hypothesis $h(\featurevec) = \weights^{T} \featurevec$ by minimizing the average squared error on the training set $\dataset$ of size $\samplesize=4$. We could learn such a hypothesis by two approaches. The first approach is to split the dataset into a training set and a validation set. Then we consider all models that consists of linear hypotheses with weight vectors having at most two non-zero weights. Each of these models corresponds to a different subset of two weights that might be non-zero.