basic information to start example

main
Gerardo Marx 2 months ago
commit 76636b1843

@ -0,0 +1,63 @@
# Linear regression
The linear regression is a training procedure based on a linear model. The model makes a prediction by simply computing a weighted sum of the input features, plus a constant term called the bias term (also called the intercept term):
$$ \hat{y}=\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_n$$
This can be writen more easy by using vector notation form for $m$ values. Therefore, the model will become:
$$
\begin{bmatrix}
\hat{y}^0 \\
\hat{y}^1\\
\hat{y}^2\\
\vdots \\
\hat{y}^m
\end{bmatrix}
=
\begin{bmatrix}
1 & x_1^0 & x_2^0 & \cdots &x_n^0\\
1 & x_1^1 & x_2^1 & \cdots & x_n^1\\
\vdots & \vdots &\vdots & \cdots & \vdots\\
1 & x_1^m & x_2^m & \cdots & x_n^m
\end{bmatrix}
\begin{bmatrix}
\theta_0 \\
\theta_1 \\
\theta_2 \\
\vdots \\
\theta_n
\end{bmatrix}
$$
Resulting:
$$\hat{y}= h_\theta(x) = x \theta $$
**Now that we have our mode, how do we train it?**
Please, consider that training the model means adjusting the parameters to reduce the error or minimizing the cost function. The most common performance measure of a regression model is the Mean Square Error (MSE). Therefore, to train a Linear Regression model, you need to find the value of θ that minimizes the MSE:
$$ MSE(X,h_\theta) = \frac{1}{m} \sum_{i=1}^{m} \left(\hat{y}^{(i)}-y^{(i)} \right)^2$$
$$ MSE(X,h_\theta) = \frac{1}{m} \sum_{i=1}^{m} \left( x^{(i)}\theta-y^{(i)} \right)^2$$
$$ MSE(X,h_\theta) = \frac{1}{m} \left( x\theta-y \right)^T \left( x\theta-y \right)$$
# The normal equation
To find the value of $\theta$ that minimizes the cost function, there is a closed-form solution that gives the result directly. This is called the **Normal Equation**; and can be find it by derivating the *MSE* equation as a function of $\theta$ and making it equals to zero:
$$\hat{\theta} = (X^T X)^{-1} X^{T} y $$
$$ Temp = \theta_0 + \theta_1 * t $$
```python
import pandas as pd
df = pd.read_csv('data.csv')
df
```

@ -0,0 +1,301 @@
0
24.218
23.154
24.347
24.411
24.411
24.347
24.314
24.347
24.347
23.896
24.476
24.637
24.669
24.669
25.056
25.088
24.991
25.088
25.217
25.281
25.313
25.668
25.668
25.636
26.022
25.926
19.126
26.248
26.248
26.055
25.152
26.699
26.989
26.957
27.021
27.118
27.247
27.344
27.666
27.183
27.795
27.892
28.021
28.311
28.214
28.504
28.536
28.762
28.826
28.858
29.245
29.181
29.374
29.6
29.567
29.793
29.761
29.89
30.147
30.147
30.438
30.599
30.728
30.856
30.76
31.018
31.114
31.34
31.533
31.501
31.727
31.469
32.017
32.081
32.113
32.5
32.403
32.403
32.693
32.726
32.887
33.016
33.048
33.08
33.37
33.37
33.499
33.725
33.789
33.821
34.047
34.079
34.144
34.305
34.434
34.434
34.659
34.756
34.659
34.691
34.917
34.981
34.981
35.271
35.4
35.336
35.239
35.594
35.626
35.819
26.796
35.948
27.408
36.174
35.304
36.271
36.528
36.561
36.689
36.657
36.979
36.979
37.044
37.205
37.173
37.237
37.205
37.302
37.656
37.56
37.592
37.882
37.882
37.817
38.043
37.173
38.269
38.365
38.397
38.591
33.016
26.022
38.913
38.945
38.913
38.945
38.945
39.235
39.203
39.268
39.3
39.493
39.042
39.59
39.622
39.654
39.815
39.88
39.912
39.912
40.009
40.009
40.234
40.234
40.234
40.363
40.524
40.524
40.557
40.557
40.653
40.814
40.557
40.911
40.879
41.072
41.169
41.104
41.072
41.104
41.137
41.523
41.33
41.523
41.523
41.62
41.813
41.781
41.846
41.813
41.942
42.136
42.136
42.136
42.136
42.104
42.168
42.361
42.458
42.232
42.49
42.361
42.394
42.426
42.394
42.716
42.748
42.813
42.651
42.813
42.748
42.941
43.103
43.135
43.103
43.038
43.135
43.264
43.425
43.328
43.328
43.457
43.457
43.521
43.683
43.779
43.683
43.683
43.715
43.973
43.94
44.102
44.005
44.005
44.005
44.23
44.359
44.424
44.392
44.327
44.327
44.424
44.521
43.779
44.682
44.714
44.649
44.649
44.746
44.778
44.907
44.972
42.2
44.939
45.036
44.907
44.327
43.876
45.004
45.197
45.294
45.358
45.326
45.229
45.358
45.101
45.423
45.391
45.713
45.681
45.616
45.713
45.616
45.713
45.713
45.713
45.745
45.648
45.971
45.938
45.938
45.938
46.067
45.971
46.035
46.132
46.196
45.938
46.164
46.261
46.261
46.229
46.261
46.229
46.229
46.357
46.551
46.519
46.551
46.583
1 0
2 24.218
3 23.154
4 24.347
5 24.411
6 24.411
7 24.347
8 24.314
9 24.347
10 24.347
11 23.896
12 24.476
13 24.637
14 24.669
15 24.669
16 25.056
17 25.088
18 24.991
19 25.088
20 25.217
21 25.281
22 25.313
23 25.668
24 25.668
25 25.636
26 26.022
27 25.926
28 19.126
29 26.248
30 26.248
31 26.055
32 25.152
33 26.699
34 26.989
35 26.957
36 27.021
37 27.118
38 27.247
39 27.344
40 27.666
41 27.183
42 27.795
43 27.892
44 28.021
45 28.311
46 28.214
47 28.504
48 28.536
49 28.762
50 28.826
51 28.858
52 29.245
53 29.181
54 29.374
55 29.6
56 29.567
57 29.793
58 29.761
59 29.89
60 30.147
61 30.147
62 30.438
63 30.599
64 30.728
65 30.856
66 30.76
67 31.018
68 31.114
69 31.34
70 31.533
71 31.501
72 31.727
73 31.469
74 32.017
75 32.081
76 32.113
77 32.5
78 32.403
79 32.403
80 32.693
81 32.726
82 32.887
83 33.016
84 33.048
85 33.08
86 33.37
87 33.37
88 33.499
89 33.725
90 33.789
91 33.821
92 34.047
93 34.079
94 34.144
95 34.305
96 34.434
97 34.434
98 34.659
99 34.756
100 34.659
101 34.691
102 34.917
103 34.981
104 34.981
105 35.271
106 35.4
107 35.336
108 35.239
109 35.594
110 35.626
111 35.819
112 26.796
113 35.948
114 27.408
115 36.174
116 35.304
117 36.271
118 36.528
119 36.561
120 36.689
121 36.657
122 36.979
123 36.979
124 37.044
125 37.205
126 37.173
127 37.237
128 37.205
129 37.302
130 37.656
131 37.56
132 37.592
133 37.882
134 37.882
135 37.817
136 38.043
137 37.173
138 38.269
139 38.365
140 38.397
141 38.591
142 33.016
143 26.022
144 38.913
145 38.945
146 38.913
147 38.945
148 38.945
149 39.235
150 39.203
151 39.268
152 39.3
153 39.493
154 39.042
155 39.59
156 39.622
157 39.654
158 39.815
159 39.88
160 39.912
161 39.912
162 40.009
163 40.009
164 40.234
165 40.234
166 40.234
167 40.363
168 40.524
169 40.524
170 40.557
171 40.557
172 40.653
173 40.814
174 40.557
175 40.911
176 40.879
177 41.072
178 41.169
179 41.104
180 41.072
181 41.104
182 41.137
183 41.523
184 41.33
185 41.523
186 41.523
187 41.62
188 41.813
189 41.781
190 41.846
191 41.813
192 41.942
193 42.136
194 42.136
195 42.136
196 42.136
197 42.104
198 42.168
199 42.361
200 42.458
201 42.232
202 42.49
203 42.361
204 42.394
205 42.426
206 42.394
207 42.716
208 42.748
209 42.813
210 42.651
211 42.813
212 42.748
213 42.941
214 43.103
215 43.135
216 43.103
217 43.038
218 43.135
219 43.264
220 43.425
221 43.328
222 43.328
223 43.457
224 43.457
225 43.521
226 43.683
227 43.779
228 43.683
229 43.683
230 43.715
231 43.973
232 43.94
233 44.102
234 44.005
235 44.005
236 44.005
237 44.23
238 44.359
239 44.424
240 44.392
241 44.327
242 44.327
243 44.424
244 44.521
245 43.779
246 44.682
247 44.714
248 44.649
249 44.649
250 44.746
251 44.778
252 44.907
253 44.972
254 42.2
255 44.939
256 45.036
257 44.907
258 44.327
259 43.876
260 45.004
261 45.197
262 45.294
263 45.358
264 45.326
265 45.229
266 45.358
267 45.101
268 45.423
269 45.391
270 45.713
271 45.681
272 45.616
273 45.713
274 45.616
275 45.713
276 45.713
277 45.713
278 45.745
279 45.648
280 45.971
281 45.938
282 45.938
283 45.938
284 46.067
285 45.971
286 46.035
287 46.132
288 46.196
289 45.938
290 46.164
291 46.261
292 46.261
293 46.229
294 46.261
295 46.229
296 46.229
297 46.357
298 46.551
299 46.519
300 46.551
301 46.583

@ -0,0 +1,226 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Linear regression\n",
"\n",
"The linear regression is a training procedure based on a linear model. The model makes a prediction by simply computing a weighted sum of the input features, plus a constant term called the bias term (also called the intercept term):\n",
"\n",
"$$ \\hat{y}=\\theta_0 + \\theta_1 x_1 + \\theta_2 x_2 + \\cdots + \\theta_n x_n$$\n",
"\n",
"This can be writen more easy by using vector notation form for $m$ values. Therefore, the model will become:\n",
"\n",
"$$ \n",
" \\begin{bmatrix}\n",
" \\hat{y}^0 \\\\ \n",
" \\hat{y}^1\\\\\n",
" \\hat{y}^2\\\\\n",
" \\vdots \\\\\n",
" \\hat{y}^m\n",
" \\end{bmatrix}\n",
" =\n",
" \\begin{bmatrix}\n",
" 1 & x_1^0 & x_2^0 & \\cdots &x_n^0\\\\\n",
" 1 & x_1^1 & x_2^1 & \\cdots & x_n^1\\\\\n",
" \\vdots & \\vdots &\\vdots & \\cdots & \\vdots\\\\\n",
" 1 & x_1^m & x_2^m & \\cdots & x_n^m\n",
" \\end{bmatrix}\n",
"\n",
" \\begin{bmatrix}\n",
" \\theta_0 \\\\\n",
" \\theta_1 \\\\\n",
" \\theta_2 \\\\\n",
" \\vdots \\\\\n",
" \\theta_n\n",
" \\end{bmatrix}\n",
"$$\n",
"\n",
"Resulting:\n",
"\n",
"$$\\hat{y}= h_\\theta(x) = x \\theta $$\n",
"\n",
"**Now that we have our mode, how do we train it?**\n",
"\n",
"Please, consider that training the model means adjusting the parameters to reduce the error or minimizing the cost function. The most common performance measure of a regression model is the Mean Square Error (MSE). Therefore, to train a Linear Regression model, you need to find the value of θ that minimizes the MSE:\n",
"\n",
"$$ MSE(X,h_\\theta) = \\frac{1}{m} \\sum_{i=1}^{m} \\left(\\hat{y}^{(i)}-y^{(i)} \\right)^2$$\n",
"\n",
"\n",
"$$ MSE(X,h_\\theta) = \\frac{1}{m} \\sum_{i=1}^{m} \\left( x^{(i)}\\theta-y^{(i)} \\right)^2$$\n",
"\n",
"$$ MSE(X,h_\\theta) = \\frac{1}{m} \\left( x\\theta-y \\right)^T \\left( x\\theta-y \\right)$$\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# The normal equation\n",
"\n",
"To find the value of $\\theta$ that minimizes the cost function, there is a closed-form solution that gives the result directly. This is called the **Normal Equation**; and can be find it by derivating the *MSE* equation as a function of $\\theta$ and making it equals to zero:\n",
"\n",
"\n",
"$$\\hat{\\theta} = (X^T X)^{-1} X^{T} y $$\n",
"\n",
"$$ Temp = \\theta_0 + \\theta_1 * t $$\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>24.218</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>23.154</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>24.347</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>24.411</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>24.411</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>295</th>\n",
" <td>46.357</td>\n",
" </tr>\n",
" <tr>\n",
" <th>296</th>\n",
" <td>46.551</td>\n",
" </tr>\n",
" <tr>\n",
" <th>297</th>\n",
" <td>46.519</td>\n",
" </tr>\n",
" <tr>\n",
" <th>298</th>\n",
" <td>46.551</td>\n",
" </tr>\n",
" <tr>\n",
" <th>299</th>\n",
" <td>46.583</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>300 rows × 1 columns</p>\n",
"</div>"
],
"text/plain": [
" 0\n",
"0 24.218\n",
"1 23.154\n",
"2 24.347\n",
"3 24.411\n",
"4 24.411\n",
".. ...\n",
"295 46.357\n",
"296 46.551\n",
"297 46.519\n",
"298 46.551\n",
"299 46.583\n",
"\n",
"[300 rows x 1 columns]"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"df = pd.read_csv('data.csv')\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'df' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[1], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[43mdf\u001b[49m)\n",
"\u001b[0;31mNameError\u001b[0m: name 'df' is not defined"
]
}
],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading…
Cancel
Save