You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

133 KiB

None <html> <head> </head>

Linear regression

The linear regression is a training procedure based on a linear model. The model makes a prediction by simply computing a weighted sum of the input features, plus a constant term called the bias term (also called the intercept term, $\theta_0$):

$$ \hat{y}=\theta_0 x_0 + \theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_n$$

This can be written more easy by the using vector notation form:

$$\hat{y}= h_\theta(x) = \theta^T x$$


Now that we have our model, how do we train it?

Please, consider that training the model means adjusting the parameters to reduce the error or minimizing the cost function.

The most common performance measure of a regression model is the Mean Square Error (MSE). Therefore, to train a Linear Regression model, you need to find the value of $\theta$ that minimizes the MSE:

$$ MSE(X,h_\theta) = \frac{1}{m} \sum_{i=1}^{m} \left( \theta^T x^{(i)}-y^{(i)} \right)^2$$

$$ MSE(X,h_\theta) = \frac{1}{m} \sum_{i=1}^{m} \left(\hat{y}^{(i)}-y^{(i)} \right)^2$$

$$ MSE(X,h_\theta) = \frac{1}{m} \left(\hat{y}-y \right)^2$$

The normal equation

To find the value of $\theta$ that minimizes the cost function, there is a closed-form or direct solution that gives the result. This is called the Normal Equation; and can be found it by deriving the MSE equation as a function of $\theta$ and making it equals to zero:

\begin{eqnarray*} \frac{\partial J(\theta)}{\partial \theta} = 0\\ \frac{\partial J(\theta)}{\partial \theta}=\frac{1}{m}\left(\theta x-y \right)^2=\frac{1}{m}\left(\theta x-y \right)^T\left(\theta x-y \right)\\ \frac{1}{m}\left[(\theta x)^T-y^T \right] \left[\theta x-y \right] \end{eqnarray*}

just considers that: \begin{align*} (A^T)^T = A\\ (A+B)^T = A^T + B^T\\ (kA)^T = kA^T\\ (AB)^T =A^TB^T \end{align*}

just considers that $(\theta x)^Ty=y^T(\theta x)$ \begin{align*} 0=\frac{\partial\,}{\partial \theta}\frac{1}{m}\left[(\theta x)^T\theta x - (\theta x)^Ty-y^T\theta x+y^Ty \right]\\ \end{align*}

$$0=\frac{\partial\,}{\partial \theta}\frac{1}{m}\left[(\theta x)^T\theta x - 2(\theta x)^Ty+y^Ty \right]$$

$$0=\frac{1}{m}\left[2\theta x^T x - 2(x)^Ty+0 \right]$$

$$0=\frac{2}{m}\left[\theta x^T x - (x)^Ty+0 \right]$$

$$\hat{\theta} = (X^T X)^{-1} X^{T} y $$

A basic example

First try to implement the linear regressor by only using two instances or points, $(x^0, y^0)$ and $(x^1,y^1)$, thus:

$$\hat{y}^{(i)}=\theta_0 x_0+ \theta_1 x_1^{(i)}$$

but, considers that $x_0=1$, then:

$$\hat{y}^{(i)}=\theta_0 + \theta_1 x_1^{(i)}$$

then, to write it in vector form

$$ \hat{y}^0= \begin{bmatrix} \theta_0 & \theta_1 \end{bmatrix} \begin{bmatrix} 1 \\ x_0 \end{bmatrix} $$

Now, we need a matrix form to manage all data:

$$ \begin{bmatrix} \hat{y}^0 \\ \hat{y}^1\\ \end{bmatrix} = \begin{bmatrix} \theta_0 + \theta_1 x^0_1 \\ \theta_0 + \theta_1 x^0_1 \\ \end{bmatrix} $$

$$ \begin{bmatrix} \hat{y}^0 \\ \hat{y}^1\\ \end{bmatrix} = \begin{bmatrix} 1 & x^0_1 \\ 1 & x^1_1 \\ \end{bmatrix} \begin{bmatrix} \theta_0 \\ \theta_1 \\ \end{bmatrix} $$

Now, let's compute $\theta$ using the normal equation:

In [2]:
import numpy as np
X0 = np.array([[0.001],
               [2.196]])
X = np.c_[np.ones(2),X0]
y = np.array([4.314825, 10.877373])
X
Out[2]:
array([[1.000e+00, 1.000e-03],
       [1.000e+00, 2.196e+00]])
In [3]:
theta =  np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
theta
Out[3]:
array([4.31183523, 2.9897713 ])
In [4]:
t0 = theta[0]
t1 = theta[1]
In [5]:
import matplotlib.pyplot as plt
plt.plot(X0,y, '*k')
plt.show()
In [8]:
Xvec = np.linspace(0,2.5,2)
Xnew = np.c_[np.ones((2, 1)), Xvec]
ypre = Xnew.dot(theta)
plt.plot(X0,y, '*k')
plt.plot(Xvec, ypre, '-r')
plt.show()

My Linear Regressor

Instructions: Create your own function train (estimate) the model's parameter for a Linear Regressor implementing the Normal Equation. Test your model with the next generated data.

In [9]:
np.random.seed(42)
X0 = 2 * np.random.rand(100, 1)
X = np.c_[np.ones((100, 1)), X0]  # add x0 = 1 to each instance
y = 4 + 3 * X0 + np.random.randn(100, 1)
plt.plot(X0, y, '.k')
Out[9]:
[<matplotlib.lines.Line2D at 0xffff60435af0>]

Therefore, the model will become:

$$ \begin{bmatrix} \hat{y}^0 \\ \hat{y}^1\\ \vdots \\ \hat{y}^n \end{bmatrix} = \begin{bmatrix} 1 & x^0_1 \\ 1 & x^1_1 \\ \vdots & \vdots \\ 1 & x^n_1 \end{bmatrix} \begin{bmatrix} \theta_0 \\ \theta_1 \\ \vdots \\ \theta_n \end{bmatrix} $$

Using Sklearn to train the model

Python already includes a linear regression function within Scikit-Learn.

In [10]:
from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()
lin_reg.fit(X, y)
lin_reg.intercept_, lin_reg.coef_
Out[10]:
(array([4.21509616]), array([[0.        , 2.77011339]]))
In [11]:
X_new = np.array([[0], [2]])
X_new_b = np.c_[np.ones((2, 1)), X_new] #Se agrega x0=1 para cada instancia
y_Pred=lin_reg.predict(X_new_b)
y_Pred
Out[11]:
array([[4.21509616],
       [9.75532293]])
In [13]:
plt.plot(X0,y, '.k', label='data')
plt.plot(X_new, y_Pred, '-r', label='SK-model')
plt.legend()
Out[13]:
<matplotlib.legend.Legend at 0xffff453bbb50>

Batch Gradient Descent Example for Linear Model

$\nabla MSE(\theta)=\frac{2}{m} X^T(X\theta-y)$

$\theta_N=\theta-\eta \nabla MSE(\theta)$

In [19]:
import numpy as np
import matplotlib.pyplot as plt

X = 2 * np.random.rand(100, 1)
Xb = np.c_[np.ones((100, 1)), X]  # add x0 = 1 to each instance
y = 4 + 3 * X + np.random.randn(100, 1)
plt.plot(X, y, '.k')
plt.show()
In [29]:
eta = 0.01  # learning rate
ni = 20 # iterations
m = 100 # dataset size
theta = np.random.randn(2,1)  # random initialization
theta
Out[29]:
array([[-0.52563093],
       [ 0.03240701]])
In [24]:
for iteration in range(ni):
    gradients = 2/m * Xb.T.dot(Xb.dot(theta) - y)
    theta = theta - eta * gradients
theta
Out[24]:
array([[4.24631965],
       [2.93425959]])
In [30]:
for iteration in range(ni):
    gradients = 2/m * Xb.T.dot(Xb.dot(theta) - y)
    theta = theta - eta * gradients
    Xnew = 2 * np.random.rand(100, 1)
    plt.plot(X,y,'.k')
    Xbw = np.c_[np.ones((100, 1)), Xnew]  # add x0 = 1 to each instance
    ypre = Xbw.dot(theta)
    plt.plot(Xnew, ypre, '-r')
</html>