You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

64 lines
2.0 KiB
Markdown

# Linear regression
The linear regression is a training procedure based on a linear model. The model makes a prediction by simply computing a weighted sum of the input features, plus a constant term called the bias term (also called the intercept term):
$$ \hat{y}=\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_n$$
This can be writen more easy by using vector notation form for $m$ values. Therefore, the model will become:
$$
\begin{bmatrix}
\hat{y}^0 \\
\hat{y}^1\\
\hat{y}^2\\
\vdots \\
\hat{y}^m
\end{bmatrix}
=
\begin{bmatrix}
1 & x_1^0 & x_2^0 & \cdots &x_n^0\\
1 & x_1^1 & x_2^1 & \cdots & x_n^1\\
\vdots & \vdots &\vdots & \cdots & \vdots\\
1 & x_1^m & x_2^m & \cdots & x_n^m
\end{bmatrix}
\begin{bmatrix}
\theta_0 \\
\theta_1 \\
\theta_2 \\
\vdots \\
\theta_n
\end{bmatrix}
$$
Resulting:
$$\hat{y}= h_\theta(x) = x \theta $$
**Now that we have our mode, how do we train it?**
Please, consider that training the model means adjusting the parameters to reduce the error or minimizing the cost function. The most common performance measure of a regression model is the Mean Square Error (MSE). Therefore, to train a Linear Regression model, you need to find the value of θ that minimizes the MSE:
$$ MSE(X,h_\theta) = \frac{1}{m} \sum_{i=1}^{m} \left(\hat{y}^{(i)}-y^{(i)} \right)^2$$
$$ MSE(X,h_\theta) = \frac{1}{m} \sum_{i=1}^{m} \left( x^{(i)}\theta-y^{(i)} \right)^2$$
$$ MSE(X,h_\theta) = \frac{1}{m} \left( x\theta-y \right)^T \left( x\theta-y \right)$$
# The normal equation
To find the value of $\theta$ that minimizes the cost function, there is a closed-form solution that gives the result directly. This is called the **Normal Equation**; and can be find it by derivating the *MSE* equation as a function of $\theta$ and making it equals to zero:
$$\hat{\theta} = (X^T X)^{-1} X^{T} y $$
$$ Temp = \theta_0 + \theta_1 * t $$
```python
import pandas as pd
df = pd.read_csv('data.csv')
df
```