# Linear regression

The linear regression is a training procedure based on a linear model. The model makes a prediction by simply computing a weighted sum of the input features, plus a constant term called the bias term (also called the intercept term):

$$ \hat{y}=\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_n$$

This can be writen more easy by using vector notation form for $m$ values. Therefore, the model will become:

$$ 
    \begin{bmatrix}
    \hat{y}^0 \\ 
    \hat{y}^1\\
    \hat{y}^2\\
    \vdots \\
    \hat{y}^m
    \end{bmatrix}
    =
    \begin{bmatrix}
    1   & x_1^0 & x_2^0 & \cdots &x_n^0\\
    1   & x_1^1 & x_2^1 & \cdots & x_n^1\\
    \vdots & \vdots &\vdots & \cdots & \vdots\\
    1   &  x_1^m & x_2^m & \cdots & x_n^m
    \end{bmatrix}

    \begin{bmatrix}
    \theta_0 \\
    \theta_1 \\
     \theta_2 \\
    \vdots \\
    \theta_n
    \end{bmatrix}
$$

Resulting:

$$\hat{y}= h_\theta(x) = x \theta $$

**Now that we have our mode, how do we train it?**

Please, consider that training the model means adjusting the parameters to reduce the error or minimizing the cost function. The most common performance measure of a regression model is the Mean Square Error (MSE). Therefore, to train a Linear Regression model, you need to find the value of θ that minimizes the MSE:

$$ MSE(X,h_\theta) = \frac{1}{m} \sum_{i=1}^{m} \left(\hat{y}^{(i)}-y^{(i)} \right)^2$$


$$ MSE(X,h_\theta) = \frac{1}{m} \sum_{i=1}^{m} \left(  x^{(i)}\theta-y^{(i)} \right)^2$$

$$ MSE(X,h_\theta) = \frac{1}{m}  \left(  x\theta-y \right)^T \left(  x\theta-y \right)$$

# The normal equation

To find the value of $\theta$ that minimizes the cost function, there is a closed-form solution that gives the result directly. This is called the **Normal Equation**; and can be find it by derivating the *MSE* equation as a function of $\theta$ and making it equals to zero:


$$\hat{\theta} = (X^T X)^{-1} X^{T} y $$

$$ Temp = \theta_0 + \theta_1 * t $$


```python
import pandas as pd
df = pd.read_csv('data.csv')
df 
```