{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Linear regression\n", "\n", "The linear regression is a training procedure based on a linear model. The model makes a prediction by simply computing a weighted sum of the input features, plus a constant term called the bias term (also called the intercept term):\n", "\n", "$$ \\hat{y}=\\theta_0 x_0 + \\theta_1 x_1 + \\theta_2 x_2 + \\cdots + \\theta_n x_n$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This can be writen more easy by using vector notation form for $m$ values. Therefore, the model will become:\n", "\n", "$$ \n", " \\begin{bmatrix}\n", " \\hat{y}^0 \\\\ \n", " \\hat{y}^1\\\\\n", " \\hat{y}^2\\\\\n", " \\vdots \\\\\n", " \\hat{y}^m\n", " \\end{bmatrix}\n", " =\n", " \\begin{bmatrix}\n", " 1 & x_0^0 & x_1^0 & \\cdots x_n^0\\\\\n", " 1 & x_0^1 & x_1^1 & \\cdots x_n^1\\\\\n", " \\vdots & \\vdots \\\\\n", " 1 & x_0^m & x_1^m & \\cdots x_n^m\n", " \\end{bmatrix}\n", " \\begin{bmatrix}\n", " \\theta_0 \\\\\n", " \\theta_1 \\\\\n", " \\theta_2 \\\\\n", " \\vdots \\\\\n", " \\theta_n\n", " \\end{bmatrix}\n", "$$\n", "Resulting:\n", "\n", "$$\\hat{y}= h_\\theta(x) = x \\theta $$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Now that we have our mode, how do we train it?**\n", "\n", "Please, consider that training the model means adjusting the parameters to reduce the error or minimizing the cost function. The most common performance measure of a regression model is the Mean Square Error (MSE). Therefore, to train a Linear Regression model, you need to find the value of θ that minimizes the MSE:\n", "\n", "$$ MSE(X,h_\\theta) = \\frac{1}{m} \\sum_{i=1}^{m} \\left(\\hat{y}^{(i)}-y^{(i)} \\right)^2$$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$ MSE(X,h_\\theta) = \\frac{1}{m} \\sum_{i=1}^{m} \\left( x^{(i)}\\theta-y^{(i)} \\right)^2$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$ MSE(X,h_\\theta) = \\frac{1}{m} \\left( x\\theta-y \\right)^T \\left( x\\theta-y \\right)$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The normal equation\n", "\n", "To find the value of $\\theta$ that minimizes the cost function, there is a closed-form solution that gives the result directly. This is called the **Normal Equation**; and can be find it by derivating the *MSE* equation as a function of $\\theta$ and making it equals to zero:\n", "\n", "\n", "$$\\hat{\\theta} = (X^T X)^{-1} X^{T} y $$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$ Temp = \\theta_0 + \\theta_1 * t $$" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | 0 | \n", "
---|---|
0 | \n", "24.218 | \n", "
1 | \n", "23.154 | \n", "
2 | \n", "24.347 | \n", "
3 | \n", "24.411 | \n", "
4 | \n", "24.411 | \n", "
... | \n", "... | \n", "
295 | \n", "46.357 | \n", "
296 | \n", "46.551 | \n", "
297 | \n", "46.519 | \n", "
298 | \n", "46.551 | \n", "
299 | \n", "46.583 | \n", "
300 rows × 1 columns
\n", "