# Regularization

Friday, January 12, 2018

Our machine learning model often encouters the problem of overfitting. Regularization is one of the techniques to solve this problem.

In regularization, we add the penalty parameter to the cost function so we penalize the model by increasing the penalty for overfitted model.

In linear regression,

$$\hat{Y} = \hat{β_{0}} + \hatβ_{1}X_{1} + \hatβ_{2}X_{2} + ... + \hatβ_{p}X_{p}$$

we use least squares fitting procedure to estimate regression coefficients $\beta_{0}, \beta_{1}, \beta_{2}, ..., \beta_{p}$ while minimizing the loss function, residual sum of squares:

$$RSS = \sum_{i=1}^n(y_{i} - \beta_{0} - \sum_{k=1}^p \beta_{j}x_{ij})^2$$

Implementing the above model in the dataset fruit_data_with_colors.txt.

This model can overfitt. Some of the regularization techniques are:

## Ridge regression

In ridge regression, we use the L2 penalty i.e. adds penalty equivalent to square of the magnitude of coefficients i.e. we minimize

$$RSS + \lambda \sum_{j=1}^p\beta_{j}^2$$

Here, $\lambda \geq 0$ is known as a tuning parameter. When $\lambda = 0$, the penalty term has no effect and ridge regression will produce the least square estimates. However, as $\lambda \rightarrow \infty$, the impact of penalty increases.

NOTE: It is best to apply ridge regression after standarizing the predictors (feature normalization).

Ridge regression solves the problem of overfitting (high variance) as a consequence of bias-variance tradeoff. As $\lambda$ increases, the flexibility of regression fit decreases, leading to decreases variance but increased bias.

## Lasso regresssion

In lasso regression, we use L1 penalty i.e. adds penalty equivalent to absolute value of the magnitude of coefficients so we minimize

$$RSS + \lambda\sum_{j=1}^p \left| \beta_{j} \right|$$

The benefit in lasso regression is that L1 penalty can force some of the coefficient estimates to be exactly equal to $0$ when $\lambda$ is large unlike in ridge regression.

## Regularization in Neural Networks

In neural networks, there are many regularization techniques used such as L2 regularization (Frobenius norm regularization), Early stopping, Dropout regularization and many more.

In general, there are many regularization techniques. Each has some advantages over others. The choice of regularization technique to use depends on the type of problem you’re trying to solve.

References: