You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

16 lines
2.2 KiB
Markdown

# Ordinary least squares
The Ordinary Least Squares (OLS) is an important method in machine learning and statistics for several reasons. The OLS is a straightforward and easy-to-understand method for fitting **linear models**. It minimizes the sum of squared differences between the observed and predicted values, making it intuitive to understand. Additionally, the OLS is the foundation for linear regression, one of the most widely used machine learning and statistics techniques. Linear regression is valuable for modeling relationships between variables when you suspect a linear relationship exists.
On the other hand, OLS provides a baseline model for comparison. When developing more complex machine learning models, starting with a simple linear regression model (OLS) is common to assess the predictive power of your features and establish a benchmark for model performance. You can also easily understand the coefficients or parameters of the model, which represent the strength and direction of the relationship between the input variables and the targets. This interpretability is important in many applications, especially when explaining results to non-technical stakeholders.
While OLS is valuable in many scenarios, it's essential to acknowledge its limitations, especially when dealing with nonlinear relationships or complex data structures. More advanced machine learning techniques like decision trees, neural networks, or support vector machines may be more appropriate in such cases. However, OLS remains a foundational method with enduring relevance in machine learning and statistics.
# Work for students
1. Add in your report how to obtain the OLS's parameter (Thetas)
2. Create a function that computes the Sum of Squared Residuals (SSR) by applying the proposed formula (lectures)
3. Compute and plot the SSR's performance applying a multi-level factorial DOE with:
- number of points *1000*, *10,000*, *1000,000*, and *1000,000,000*
- deviation *0.1*, *1*, *2*, *3*,*5*
4. Collect the time consumed by your OLS function (to compute thetas) with *1000*, *10,000*, *1000,000*, and *1000,000,000* points
5. Explore your generated data by using histograms and dispersion plots
6. In the dispersion plots, add your model performance