04.Shrinkage Methods

Shrinkage methods

Why? Some variables might be redundant. Shrink the model.

Ridge Regression

Lasso

Small constraint $t$ cause some of the coefficients reduce exactly to 0: this is variable selection, while producing sparse model.
Convex optimization.

Why would lasso leads to exact 0 coefficients?

Would spot the reason as long as you plot out the constraints and the RSS. Fig. 3.11.

Compare Lasso and Ridge

For sparse models, lasso is better. Otherwise, lasso can make the fitting worse than ridge.

No rule of thumb.

Generalization

Ridge and lasso can be generalized. Replace the distance calculation with other definitions, i.e., $\sum \lvert \beta_j \rvert^q$.

$q=0$: subset selection
$q=1$: lasso
$q=2$: ridge

Smaller $q$ leads to tighter selection.

{% highlight text %} Plot[Evaluate@Table[(1 - x^(q))^(1/q), {q, 0.5, 4, 0.5}], {x, 0, 1}, AspectRatio -> 1, Frame -> True, PlotLegends -> Placed[Table[“q=” <> ToString@q, {q, 0.5, 4, 0.5}], {Left, Bottom}], PlotLabel -> “Shrinkage as function of L-q norm disance”, FrameLabel -> {"!(*SubscriptBox[([Beta]), (i)])", “!(*SubscriptBox[([Beta]), (j)])"}] {% endhighlight %}

Planted: 2016-09-23 by OctoMiao;

Table of Contents

Current Ref:

esl/04.shrinkage-methods.md