You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

899 KiB

None <html lang="en"> <head> </head>

Titulo del proyecto

Module 1: Logistic Regression Classifier

Author: Sofia Samaniego Lopez

Institution: Universidad Autonoma de Baja California (UABC)

Advisor: Dr. Gerardo Marx Chavez Campos

This script establishes the baseline evaluation for binary classification using the classic Iris dataset. The objective of this specific work is to analyze linear combination, decision boundaries, and probability estimation based on morphological features—specifically petal width and length.

The model utilizes Scikit-Learn's LogisticRegression framework to execute the optimization process and map inputs to a probability output using the Sigmoid function. This code serves as a strict benchmark comparison, generating the reference ground truth metrics and spatial visualizations required to evaluate the performance and accuracy of subsequent custom implementations.

Experimental Setup and Preliminary Analysis

Step 1: Import Required Libraries and Environment Setup

In this initial stage, the necessary scientific computing and data processing libraries are imported to set up our development environment:

  • NumPy: Utilized for efficient multi-dimensional array operations and core matrix algebra.
  • Matplotlib (pyplot): Employed to generate the spatial scatter plots and map the resulting decision boundaries.
  • Scikit-Learn: Specifically importing the datasets module to fetch the target morphological data and LogisticRegression to serve as our validation benchmark baseline.
In [10]:
!pip3 install scikit-learn
!pip3 install matplotlib
!pip3 install numpy

import matplotlib.pyplot as plt
import numpy as np

Step 2: Load and Explore the Iris Dataset Characteristics

The classic Iris Dataset is loaded into the workspace to establish our baseline classification task.

Dataset Overview and Morphological Features

The complete dataset consists of 150 samples from three distinct species of Iris flowers (Iris setosa, Iris versicolor, and Iris virginica). For each sample, four continuous geometric features are available:

  1. Sepal Length
  2. Sepal Width
  3. Petal Length
  4. Petal Width

Problem Simplification & Single-Feature Evaluation

The classification task is constrained to analyze the Sigmoid function and linear thresholds on a simpler scale:

  • Target Binarization: The multi-class target is converted to binary labels ($y=1$ for Iris virginica, $y=0$ for others) to establish a clear threshold.
  • Single-Feature Isolation: Instead of combining dimensions, features are evaluated one at a time (e.g., petal width or sepal length) to inspect their independent separation power before building multi-dimensional models.
In [11]:
from sklearn import datasets
iris=datasets.load_iris()
print(iris.DESCR)
.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
    - sepal length in cm
    - sepal width in cm
    - petal length in cm
    - petal width in cm
    - class:
            - Iris-Setosa
            - Iris-Versicolour
            - Iris-Virginica

:Summary Statistics:

============== ==== ==== ======= ===== ====================
                Min  Max   Mean    SD   Class Correlation
============== ==== ==== ======= ===== ====================
sepal length:   4.3  7.9   5.84   0.83    0.7826
sepal width:    2.0  4.4   3.05   0.43   -0.4194
petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)
============== ==== ==== ======= ===== ====================

:Missing Attribute Values: None
:Class Distribution: 33.3% for each of 3 classes.
:Creator: R.A. Fisher
:Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
:Date: July, 1988

The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
from Fisher's paper. Note that it's the same as in R, but not as in the UCI
Machine Learning Repository, which has two wrong data points.

This is perhaps the best known database to be found in the
pattern recognition literature.  Fisher's paper is a classic in the field and
is referenced frequently to this day.  (See Duda & Hart, for example.)  The
data set contains 3 classes of 50 instances each, where each class refers to a
type of iris plant.  One class is linearly separable from the other 2; the
latter are NOT linearly separable from each other.

.. dropdown:: References

  - Fisher, R.A. "The use of multiple measurements in taxonomic problems"
    Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
    Mathematical Statistics" (John Wiley, NY, 1950).
  - Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.
    (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
  - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
    Structure and Classification Rule for Recognition in Partially Exposed
    Environments".  IEEE Transactions on Pattern Analysis and Machine
    Intelligence, Vol. PAMI-2, No. 1, 67-71.
  - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE Transactions
    on Information Theory, May 1972, 431-433.
  - See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al"s AUTOCLASS II
    conceptual clustering system finds 3 classes in the data.
  - Many, many more ...

Step 3: Exploratory Data Analysis & Target Inspection

Prior to model optimization, a visual and structural inspection evaluates the data distribution:

  • Unclassified Spatial Mapping: Plotting Sepal Length (sl) vs. Sepal Width (sw) using plain markers ('.k') reveals the raw data structure. This helps verify if the morphological features naturally form distinct clusters before applying any algorithmic boundaries.
  • Target Label Mapping (iris.target): Inspecting the ground-truth array to map the multi-class numerical taxonomy:
    • 0: Iris setosa
    • 1: Iris versicolor
    • 2: Iris virginica

This inspection links the visual spatial groups with their mathematical labels, establishing the baseline before data binarization.

In [12]:
sl = iris.data[:, 0:1]
sw = iris.data[:, 1:2]
plt.plot(sl,sw,'.k')
plt.show()
No description has been provided for this image
In [13]:
iris.target
Out[13]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

Step 4: Theoretical Sigmoid Function & Decision Boundary

Generating a synthetic domain from -10 to 10 to plot the standalone mathematical Sigmoid function:

$$\sigma(t) = \frac{1}{1 + e^{-t}}$$

This visualizes how the curve maps inputs to a probability between 0 and 1, establishing the inflection point $\sigma(0) = 0.5$ as the theoretical threshold for the decision boundary.

In [14]:
t= np.linspace(-10,10,100)
sig = 1/(1+np.exp(-t))
plt.plot(t,sig, '.b', label=r"$\sigma$")
plt.legend(loc='upper left', fontsize=20)
plt.show()
No description has been provided for this image

Model Training and Benchmark Evaluation

Model 1: Iris-Setosa Classifier based on petal width

Feature Selection & Setosa Target Binarization

The dataset is filtered and restructured to evaluate a new binary classification task:

  • Feature Vector ($X$): Slicing index [:, 3:] isolates Petal Width as the continuous predictor variable.
  • Target Binarization ($y$): The criteria (iris.target == 0).astype(int) shifts the positive class ($y=1$) exclusively to Iris setosa, mapping all other species to $0$. This creates the target array seen in the output.
In [15]:
x = iris.data[:, 3:]
y = (iris.target == 0).astype(int)
y
Out[15]:
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Benchmark Model Initialization and Fitting

This cell instantiates and trains the baseline classification model using Scikit-Learn:

  • LogisticRegression(solver='lbfgs', random_state=42): Instantiates the model using the Limited-memory BFGS (lbfgs) optimization solver and locks the random_state to 42 to ensure reproducible weight initialization.
  • mylr.fit(x, y): Trains the classifier on the isolated feature vector x and the binarized target labels y. This optimization process computes the optimal weight ($w$) and bias ($b$) parameters that minimize the loss function.
In [16]:
from sklearn.linear_model import LogisticRegression
mylr = LogisticRegression(solver='lbfgs', random_state=42)
mylr.fit(x,y)
Out[16]:
LogisticRegression(random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
random_state random_state: int, RandomState instance, default=None

Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the
data. See :term:`Glossary <random_state>` for details.
42
penalty penalty: {'l1', 'l2', 'elasticnet', None}, default='l2'

Specify the norm of the penalty:

- `None`: no penalty is added;
- `'l2'`: add an L2 penalty term and it is the default choice;
- `'l1'`: add an L1 penalty term;
- `'elasticnet'`: both L1 and L2 penalty terms are added.

.. warning::
Some penalties may not work with some solvers. See the parameter
`solver` below, to know the compatibility between the penalty and
solver.

.. versionadded:: 0.19
l1 penalty with SAGA solver (allowing 'multinomial' + L1)

.. deprecated:: 1.8
`penalty` was deprecated in version 1.8 and will be removed in 1.10.
Use `l1_ratio` and `C` instead. `l1_ratio=0` for `penalty='l2'`,
`l1_ratio=1` for `penalty='l1'`, `l1_ratio` set to any float between 0 and 1
for `penalty='elasticnet'`, and `C=np.inf` for `penalty=None`.
'deprecated'
C C: float, default=1.0

Inverse of regularization strength; must be a positive float.
Like in support vector machines, smaller values specify stronger
regularization. `C=np.inf` results in unpenalized logistic regression.
For a visual example on the effect of tuning the `C` parameter
with an L1 penalty, see:
:ref:`sphx_glr_auto_examples_linear_model_plot_logistic_path.py`.
1.0
l1_ratio l1_ratio: float, default=0.0

The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Setting
`l1_ratio=1` gives a pure L1-penalty, setting `l1_ratio=0` a pure L2-penalty.
Any value between 0 and 1 gives an Elastic-Net penalty of the form
`l1_ratio * L1 + (1 - l1_ratio) * L2`.

.. warning::
Certain values of `l1_ratio`, i.e. some penalties, may not work with some
solvers. See the parameter `solver` below, to know the compatibility between
the penalty and solver.

.. versionchanged:: 1.8
Default value changed from None to 0.0.

.. deprecated:: 1.8
`None` is deprecated and will be removed in version 1.10. Always use
`l1_ratio` to specify the penalty type.
0.0
dual dual: bool, default=False

Dual (constrained) or primal (regularized, see also
:ref:`this equation <regularized-logistic-loss>`) formulation. Dual formulation
is only implemented for l2 penalty with liblinear solver. Prefer `dual=False`
when n_samples > n_features.
False
tol tol: float, default=1e-4

Tolerance for stopping criteria.
0.0001
fit_intercept fit_intercept: bool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be
added to the decision function.
True
intercept_scaling intercept_scaling: float, default=1

Useful only when the solver `liblinear` is used
and `self.fit_intercept` is set to `True`. In this case, `x` becomes
`[x, self.intercept_scaling]`,
i.e. a "synthetic" feature with constant value equal to
`intercept_scaling` is appended to the instance vector.
The intercept becomes
``intercept_scaling * synthetic_feature_weight``.

.. note::
The synthetic feature weight is subject to L1 or L2
regularization as all other features.
To lessen the effect of regularization on synthetic feature weight
(and therefore on the intercept) `intercept_scaling` has to be increased.
1
class_weight class_weight: dict or 'balanced', default=None

Weights associated with classes in the form ``{class_label: weight}``.
If not given, all classes are supposed to have weight one.

The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as ``n_samples / (n_classes * np.bincount(y))``.

Note that these weights will be multiplied with sample_weight (passed
through the fit method) if sample_weight is specified.

.. versionadded:: 0.17
*class_weight='balanced'*
None
solver solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs'

Algorithm to use in the optimization problem. Default is 'lbfgs'.
To choose a solver, you might want to consider the following aspects:

- 'lbfgs' is a good default solver because it works reasonably well for a wide
class of problems.
- For :term:`multiclass` problems (`n_classes >= 3`), all solvers except
'liblinear' minimize the full multinomial loss, 'liblinear' will raise an
error.
- 'newton-cholesky' is a good choice for
`n_samples` >> `n_features * n_classes`, especially with one-hot encoded
categorical features with rare categories. Be aware that the memory usage
of this solver has a quadratic dependency on `n_features * n_classes`
because it explicitly computes the full Hessian matrix.
- For small datasets, 'liblinear' is a good choice, whereas 'sag'
and 'saga' are faster for large ones;
- 'liblinear' can only handle binary classification by default. To apply a
one-versus-rest scheme for the multiclass setting one can wrap it with the
:class:`~sklearn.multiclass.OneVsRestClassifier`.

.. warning::
The choice of the algorithm depends on the penalty chosen (`l1_ratio=0`
for L2-penalty, `l1_ratio=1` for L1-penalty and `0 < l1_ratio < 1` for
Elastic-Net) and on (multinomial) multiclass support:

================= ======================== ======================
solver l1_ratio multinomial multiclass
================= ======================== ======================
'lbfgs' l1_ratio=0 yes
'liblinear' l1_ratio=1 or l1_ratio=0 no
'newton-cg' l1_ratio=0 yes
'newton-cholesky' l1_ratio=0 yes
'sag' l1_ratio=0 yes
'saga' 0<=l1_ratio<=1 yes
================= ======================== ======================

.. note::
'sag' and 'saga' fast convergence is only guaranteed on features
with approximately the same scale. You can preprocess the data with
a scaler from :mod:`sklearn.preprocessing`.

.. seealso::
Refer to the :ref:`User Guide <Logistic_regression>` for more
information regarding :class:`LogisticRegression` and more specifically the
:ref:`Table <logistic_regression_solvers>`
summarizing solver/penalty supports.

.. versionadded:: 0.17
Stochastic Average Gradient (SAG) descent solver. Multinomial support in
version 0.18.
.. versionadded:: 0.19
SAGA solver.
.. versionchanged:: 0.22
The default solver changed from 'liblinear' to 'lbfgs' in 0.22.
.. versionadded:: 1.2
newton-cholesky solver. Multinomial support in version 1.6.
'lbfgs'
max_iter max_iter: int, default=100

Maximum number of iterations taken for the solvers to converge.
100
verbose verbose: int, default=0

For the liblinear and lbfgs solvers set verbose to any positive
number for verbosity.
0
warm_start warm_start: bool, default=False

When set to True, reuse the solution of the previous call to fit as
initialization, otherwise, just erase the previous solution.
Useless for liblinear solver. See :term:`the Glossary <warm_start>`.

.. versionadded:: 0.17
*warm_start* to support *lbfgs*, *newton-cg*, *sag*, *saga* solvers.
False
n_jobs n_jobs: int, default=None

Does not have any effect.

.. deprecated:: 1.8
`n_jobs` is deprecated in version 1.8 and will be removed in 1.10.
None
In [17]:
Xnew = np.linspace(-1,3,100).reshape(-1,1)
yPred = mylr.predict_proba(Xnew)
#plt.plot(Xnew, yPred[:,0], label= 'No Iris')
plt.plot(Xnew, yPred[:,1], label= 'Yes Iris')
plt.legend()
plt.plot(x,y,'*g')
plt.show()
No description has been provided for this image

This plot visualizes the trained model's Sigmoid prediction curve over the experimental dataset samples:

  • Sample Distribution (Green Stars): Represents the real dataset. Small petal widths (0.1 - 0.6 cm) belong to Iris setosa ($y=1$), while larger widths (1.0 - 2.5 cm) belong to the other species ($y=0$).
  • Sigmoid Mapping (Blue Curve): Displays an inverted logistic curve. It demonstrates the mathematical relationship: as petal width increases, the probability of the flower being Iris setosa drops sharply from 1.0 to 0.0.
  • Decision Boundary Threshold: The curve crosses the 0.5 probability threshold at approximately 0.75 cm. This inflection point defines the exact baseline boundary separating both classifications.

Model 2: Iris-Setosa Classifier based on petal length

Feature Shift Petal Length Isolation

The model configuration is updated to evaluate a different morphological predictor:

  • Feature Vector ($X$): Slicing index [:, 2:3] isolates Petal Length as the independent variable.
  • Target Continuity ($y$): The classification objective remains focused on Iris setosa (iris.target == 0) to compare the separation power of petal length against the previous petal width baseline.
In [18]:
x = iris.data[:, 2:3]
y = (iris.target == 0).astype(int)
In [19]:
from sklearn.linear_model import LogisticRegression
mylr = LogisticRegression(solver='lbfgs', random_state=42)
mylr.fit(x,y)
Out[19]:
LogisticRegression(random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
random_state random_state: int, RandomState instance, default=None

Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the
data. See :term:`Glossary <random_state>` for details.
42
penalty penalty: {'l1', 'l2', 'elasticnet', None}, default='l2'

Specify the norm of the penalty:

- `None`: no penalty is added;
- `'l2'`: add an L2 penalty term and it is the default choice;
- `'l1'`: add an L1 penalty term;
- `'elasticnet'`: both L1 and L2 penalty terms are added.

.. warning::
Some penalties may not work with some solvers. See the parameter
`solver` below, to know the compatibility between the penalty and
solver.

.. versionadded:: 0.19
l1 penalty with SAGA solver (allowing 'multinomial' + L1)

.. deprecated:: 1.8
`penalty` was deprecated in version 1.8 and will be removed in 1.10.
Use `l1_ratio` and `C` instead. `l1_ratio=0` for `penalty='l2'`,
`l1_ratio=1` for `penalty='l1'`, `l1_ratio` set to any float between 0 and 1
for `penalty='elasticnet'`, and `C=np.inf` for `penalty=None`.
'deprecated'
C C: float, default=1.0

Inverse of regularization strength; must be a positive float.
Like in support vector machines, smaller values specify stronger
regularization. `C=np.inf` results in unpenalized logistic regression.
For a visual example on the effect of tuning the `C` parameter
with an L1 penalty, see:
:ref:`sphx_glr_auto_examples_linear_model_plot_logistic_path.py`.
1.0
l1_ratio l1_ratio: float, default=0.0

The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Setting
`l1_ratio=1` gives a pure L1-penalty, setting `l1_ratio=0` a pure L2-penalty.
Any value between 0 and 1 gives an Elastic-Net penalty of the form
`l1_ratio * L1 + (1 - l1_ratio) * L2`.

.. warning::
Certain values of `l1_ratio`, i.e. some penalties, may not work with some
solvers. See the parameter `solver` below, to know the compatibility between
the penalty and solver.

.. versionchanged:: 1.8
Default value changed from None to 0.0.

.. deprecated:: 1.8
`None` is deprecated and will be removed in version 1.10. Always use
`l1_ratio` to specify the penalty type.
0.0
dual dual: bool, default=False

Dual (constrained) or primal (regularized, see also
:ref:`this equation <regularized-logistic-loss>`) formulation. Dual formulation
is only implemented for l2 penalty with liblinear solver. Prefer `dual=False`
when n_samples > n_features.
False
tol tol: float, default=1e-4

Tolerance for stopping criteria.
0.0001
fit_intercept fit_intercept: bool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be
added to the decision function.
True
intercept_scaling intercept_scaling: float, default=1

Useful only when the solver `liblinear` is used
and `self.fit_intercept` is set to `True`. In this case, `x` becomes
`[x, self.intercept_scaling]`,
i.e. a "synthetic" feature with constant value equal to
`intercept_scaling` is appended to the instance vector.
The intercept becomes
``intercept_scaling * synthetic_feature_weight``.

.. note::
The synthetic feature weight is subject to L1 or L2
regularization as all other features.
To lessen the effect of regularization on synthetic feature weight
(and therefore on the intercept) `intercept_scaling` has to be increased.
1
class_weight class_weight: dict or 'balanced', default=None

Weights associated with classes in the form ``{class_label: weight}``.
If not given, all classes are supposed to have weight one.

The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as ``n_samples / (n_classes * np.bincount(y))``.

Note that these weights will be multiplied with sample_weight (passed
through the fit method) if sample_weight is specified.

.. versionadded:: 0.17
*class_weight='balanced'*
None
solver solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs'

Algorithm to use in the optimization problem. Default is 'lbfgs'.
To choose a solver, you might want to consider the following aspects:

- 'lbfgs' is a good default solver because it works reasonably well for a wide
class of problems.
- For :term:`multiclass` problems (`n_classes >= 3`), all solvers except
'liblinear' minimize the full multinomial loss, 'liblinear' will raise an
error.
- 'newton-cholesky' is a good choice for
`n_samples` >> `n_features * n_classes`, especially with one-hot encoded
categorical features with rare categories. Be aware that the memory usage
of this solver has a quadratic dependency on `n_features * n_classes`
because it explicitly computes the full Hessian matrix.
- For small datasets, 'liblinear' is a good choice, whereas 'sag'
and 'saga' are faster for large ones;
- 'liblinear' can only handle binary classification by default. To apply a
one-versus-rest scheme for the multiclass setting one can wrap it with the
:class:`~sklearn.multiclass.OneVsRestClassifier`.

.. warning::
The choice of the algorithm depends on the penalty chosen (`l1_ratio=0`
for L2-penalty, `l1_ratio=1` for L1-penalty and `0 < l1_ratio < 1` for
Elastic-Net) and on (multinomial) multiclass support:

================= ======================== ======================
solver l1_ratio multinomial multiclass
================= ======================== ======================
'lbfgs' l1_ratio=0 yes
'liblinear' l1_ratio=1 or l1_ratio=0 no
'newton-cg' l1_ratio=0 yes
'newton-cholesky' l1_ratio=0 yes
'sag' l1_ratio=0 yes
'saga' 0<=l1_ratio<=1 yes
================= ======================== ======================

.. note::
'sag' and 'saga' fast convergence is only guaranteed on features
with approximately the same scale. You can preprocess the data with
a scaler from :mod:`sklearn.preprocessing`.

.. seealso::
Refer to the :ref:`User Guide <Logistic_regression>` for more
information regarding :class:`LogisticRegression` and more specifically the
:ref:`Table <logistic_regression_solvers>`
summarizing solver/penalty supports.

.. versionadded:: 0.17
Stochastic Average Gradient (SAG) descent solver. Multinomial support in
version 0.18.
.. versionadded:: 0.19
SAGA solver.
.. versionchanged:: 0.22
The default solver changed from 'liblinear' to 'lbfgs' in 0.22.
.. versionadded:: 1.2
newton-cholesky solver. Multinomial support in version 1.6.
'lbfgs'
max_iter max_iter: int, default=100

Maximum number of iterations taken for the solvers to converge.
100
verbose verbose: int, default=0

For the liblinear and lbfgs solvers set verbose to any positive
number for verbosity.
0
warm_start warm_start: bool, default=False

When set to True, reuse the solution of the previous call to fit as
initialization, otherwise, just erase the previous solution.
Useless for liblinear solver. See :term:`the Glossary <warm_start>`.

.. versionadded:: 0.17
*warm_start* to support *lbfgs*, *newton-cg*, *sag*, *saga* solvers.
False
n_jobs n_jobs: int, default=None

Does not have any effect.

.. deprecated:: 1.8
`n_jobs` is deprecated in version 1.8 and will be removed in 1.10.
None
In [20]:
Xnew = np.linspace(0,8,100).reshape(-1,1)
yPred = mylr.predict_proba(Xnew)
#plt.plot(Xnew, yPred[:,0], label= 'No Iris')
plt.plot(Xnew, yPred[:,1], label= 'Yes Iris')
plt.legend()
plt.plot(x,y,'*g')
plt.axis([1.5, 5, -0.5, 1.5])
plt.show()
No description has been provided for this image

This plot illustrates the performance of the second univariable model using Petal Length:

  • Sample Distribution: Samples with short petal lengths (1.0 - 2.0 cm) are correctly clustered as Iris setosa ($y=1$), while samples with larger lengths ($>3.0$ cm) map to $y=0$.
  • Sigmoid Mapping: The descending blue curve demonstrates that as petal length increases, the probability of the sample being Iris setosa drops sharply from 1.0 to 0.0.
  • Decision Boundary: The curve crosses the 0.5 probability threshold at approximately 2.5 cm, marking the exact inflection point that separates the target class from the rest of the dataset.

Model 3: Iris-Setosa Classifier based on Sepal length

Feature Shift Sepal Length Isolation

The model evaluates a third morphological predictor independently:

  • Feature Vector ($X$): Slicing index [:, 0:1] isolates Sepal Length as the continuous independent variable.
  • Target Continuity ($y$): The objective remains focused on Iris setosa ($y=1$) to compare the separation power of sepal dimensions against the previous petal metrics.
In [21]:
x = iris.data[:, 0:1]
y = (iris.target == 0).astype(int)
from sklearn.linear_model import LogisticRegression
mylr = LogisticRegression(solver='lbfgs', random_state=42)
mylr.fit(x,y)
Out[21]:
LogisticRegression(random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
random_state random_state: int, RandomState instance, default=None

Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the
data. See :term:`Glossary <random_state>` for details.
42
penalty penalty: {'l1', 'l2', 'elasticnet', None}, default='l2'

Specify the norm of the penalty:

- `None`: no penalty is added;
- `'l2'`: add an L2 penalty term and it is the default choice;
- `'l1'`: add an L1 penalty term;
- `'elasticnet'`: both L1 and L2 penalty terms are added.

.. warning::
Some penalties may not work with some solvers. See the parameter
`solver` below, to know the compatibility between the penalty and
solver.

.. versionadded:: 0.19
l1 penalty with SAGA solver (allowing 'multinomial' + L1)

.. deprecated:: 1.8
`penalty` was deprecated in version 1.8 and will be removed in 1.10.
Use `l1_ratio` and `C` instead. `l1_ratio=0` for `penalty='l2'`,
`l1_ratio=1` for `penalty='l1'`, `l1_ratio` set to any float between 0 and 1
for `penalty='elasticnet'`, and `C=np.inf` for `penalty=None`.
'deprecated'
C C: float, default=1.0

Inverse of regularization strength; must be a positive float.
Like in support vector machines, smaller values specify stronger
regularization. `C=np.inf` results in unpenalized logistic regression.
For a visual example on the effect of tuning the `C` parameter
with an L1 penalty, see:
:ref:`sphx_glr_auto_examples_linear_model_plot_logistic_path.py`.
1.0
l1_ratio l1_ratio: float, default=0.0

The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Setting
`l1_ratio=1` gives a pure L1-penalty, setting `l1_ratio=0` a pure L2-penalty.
Any value between 0 and 1 gives an Elastic-Net penalty of the form
`l1_ratio * L1 + (1 - l1_ratio) * L2`.

.. warning::
Certain values of `l1_ratio`, i.e. some penalties, may not work with some
solvers. See the parameter `solver` below, to know the compatibility between
the penalty and solver.

.. versionchanged:: 1.8
Default value changed from None to 0.0.

.. deprecated:: 1.8
`None` is deprecated and will be removed in version 1.10. Always use
`l1_ratio` to specify the penalty type.
0.0
dual dual: bool, default=False

Dual (constrained) or primal (regularized, see also
:ref:`this equation <regularized-logistic-loss>`) formulation. Dual formulation
is only implemented for l2 penalty with liblinear solver. Prefer `dual=False`
when n_samples > n_features.
False
tol tol: float, default=1e-4

Tolerance for stopping criteria.
0.0001
fit_intercept fit_intercept: bool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be
added to the decision function.
True
intercept_scaling intercept_scaling: float, default=1

Useful only when the solver `liblinear` is used
and `self.fit_intercept` is set to `True`. In this case, `x` becomes
`[x, self.intercept_scaling]`,
i.e. a "synthetic" feature with constant value equal to
`intercept_scaling` is appended to the instance vector.
The intercept becomes
``intercept_scaling * synthetic_feature_weight``.

.. note::
The synthetic feature weight is subject to L1 or L2
regularization as all other features.
To lessen the effect of regularization on synthetic feature weight
(and therefore on the intercept) `intercept_scaling` has to be increased.
1
class_weight class_weight: dict or 'balanced', default=None

Weights associated with classes in the form ``{class_label: weight}``.
If not given, all classes are supposed to have weight one.

The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as ``n_samples / (n_classes * np.bincount(y))``.

Note that these weights will be multiplied with sample_weight (passed
through the fit method) if sample_weight is specified.

.. versionadded:: 0.17
*class_weight='balanced'*
None
solver solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs'

Algorithm to use in the optimization problem. Default is 'lbfgs'.
To choose a solver, you might want to consider the following aspects:

- 'lbfgs' is a good default solver because it works reasonably well for a wide
class of problems.
- For :term:`multiclass` problems (`n_classes >= 3`), all solvers except
'liblinear' minimize the full multinomial loss, 'liblinear' will raise an
error.
- 'newton-cholesky' is a good choice for
`n_samples` >> `n_features * n_classes`, especially with one-hot encoded
categorical features with rare categories. Be aware that the memory usage
of this solver has a quadratic dependency on `n_features * n_classes`
because it explicitly computes the full Hessian matrix.
- For small datasets, 'liblinear' is a good choice, whereas 'sag'
and 'saga' are faster for large ones;
- 'liblinear' can only handle binary classification by default. To apply a
one-versus-rest scheme for the multiclass setting one can wrap it with the
:class:`~sklearn.multiclass.OneVsRestClassifier`.

.. warning::
The choice of the algorithm depends on the penalty chosen (`l1_ratio=0`
for L2-penalty, `l1_ratio=1` for L1-penalty and `0 < l1_ratio < 1` for
Elastic-Net) and on (multinomial) multiclass support:

================= ======================== ======================
solver l1_ratio multinomial multiclass
================= ======================== ======================
'lbfgs' l1_ratio=0 yes
'liblinear' l1_ratio=1 or l1_ratio=0 no
'newton-cg' l1_ratio=0 yes
'newton-cholesky' l1_ratio=0 yes
'sag' l1_ratio=0 yes
'saga' 0<=l1_ratio<=1 yes
================= ======================== ======================

.. note::
'sag' and 'saga' fast convergence is only guaranteed on features
with approximately the same scale. You can preprocess the data with
a scaler from :mod:`sklearn.preprocessing`.

.. seealso::
Refer to the :ref:`User Guide <Logistic_regression>` for more
information regarding :class:`LogisticRegression` and more specifically the
:ref:`Table <logistic_regression_solvers>`
summarizing solver/penalty supports.

.. versionadded:: 0.17
Stochastic Average Gradient (SAG) descent solver. Multinomial support in
version 0.18.
.. versionadded:: 0.19
SAGA solver.
.. versionchanged:: 0.22
The default solver changed from 'liblinear' to 'lbfgs' in 0.22.
.. versionadded:: 1.2
newton-cholesky solver. Multinomial support in version 1.6.
'lbfgs'
max_iter max_iter: int, default=100

Maximum number of iterations taken for the solvers to converge.
100
verbose verbose: int, default=0

For the liblinear and lbfgs solvers set verbose to any positive
number for verbosity.
0
warm_start warm_start: bool, default=False

When set to True, reuse the solution of the previous call to fit as
initialization, otherwise, just erase the previous solution.
Useless for liblinear solver. See :term:`the Glossary <warm_start>`.

.. versionadded:: 0.17
*warm_start* to support *lbfgs*, *newton-cg*, *sag*, *saga* solvers.
False
n_jobs n_jobs: int, default=None

Does not have any effect.

.. deprecated:: 1.8
`n_jobs` is deprecated in version 1.8 and will be removed in 1.10.
None
In [22]:
Xnew = np.linspace(0,8,100).reshape(-1,1)
yPred = mylr.predict_proba(Xnew)
#plt.plot(Xnew, yPred[:,0], label= 'No Iris')
plt.plot(Xnew, yPred[:,1], label= 'Yes Iris')
plt.legend()
plt.plot(x,y,'*g')
plt.axis([3.5, 7, -0.1, 1.1])
plt.show()
No description has been provided for this image

This plot displays the performance of the third univariable model using Sepal Length:

  • Sample Distribution: Samples representing Iris setosa ($y=1$) are concentrated at shorter lengths, but show a much higher spatial overlap with non-setosa samples ($y=0$) compared to the previous petal features.
  • Sigmoid Mapping: The descending curve shows the probability dropping as sepal length increases. Due to this significant data overlap, the slope is less steep, indicating a more gradual and less aggressive probabilistic transition.
  • Decision Boundary: The inflection point at $\sigma = 0.5$ establishes the final threshold. This boundary carries more classification uncertainty because sepal dimensions are naturally less distinct between these species.

Model 4: Multiple features classifier

Multi-Class Spatial Mapping (Sepal Features)

This cell upgrades the initial exploratory plot by adding the ground-truth class labels to the 2D sepal feature space:

  • Feature Interaction: Maps Sepal Length (sl) against Sepal Width (sw) simultaneously to analyze their combined distribution.
  • Class Color-Coding: Differentiates the three original species using distinct markers: Green for Setosa, Red for Versicolor, and Blue for Virginica.
  • Visual Separability Analysis: Allows immediate observation of the data structure, showing that while Setosa forms a perfectly isolated cluster, Versicolor and Virginica exhibit significant spatial overlap, justifying the need for optimization models.
In [23]:
import matplotlib.pyplot as plt
sl = iris.data[:,0:1]
sw = iris.data[:,1:2]
tg = iris.target
plt.plot(sl[tg==0,0], sw[tg==0,0],'.g' ,label='Set')
plt.plot(sl[tg==1,0], sw[tg==1,0],'.r', label='Ver')
plt.plot(sl[tg==2,0], sw[tg==2,0],'.b', label='Vir')
plt.legend()
plt.show()
No description has been provided for this image

Bivariate Model Training for Iris Virginica

This cell configures and trains a multi-feature logistic regression model utilizing tuned optimization parameters:

  • Bivariate Data Selection: * Features (X): Slices index [:, 0:2] to combine Sepal Length and Sepal Width into a two-dimensional feature space.
    • Target (y): Shifts the positive class focus exclusively to Iris virginica (iris.target == 2).
  • Hyperparameter Tuning (mylrvir):
    • solver='newton-cg': Uses the Newton-Conjugate Gradient method to compute accurate optimization paths.
    • C=100 & tol=1e-5: Applies high cost (low regularization) to allow a tighter fit to the data, paired with a strict tolerance for precise convergence.
  • mylrvir.fit(X, y): Trains the system to find the optimal weight vector $w = [w_1, w_2]$ and bias ($b$), establishing the multi-variable benchmark line.
In [24]:
X = iris.data[:,0:2]
y = (iris.target==2).astype(int)
mylrvir = LogisticRegression(
    random_state=22,
    tol=1e-5,
    C=100,
    max_iter=100,
    solver='newton-cg'
)
mylrvir.fit(X,y)
Out[24]:
LogisticRegression(C=100, random_state=22, solver='newton-cg', tol=1e-05)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
C C: float, default=1.0

Inverse of regularization strength; must be a positive float.
Like in support vector machines, smaller values specify stronger
regularization. `C=np.inf` results in unpenalized logistic regression.
For a visual example on the effect of tuning the `C` parameter
with an L1 penalty, see:
:ref:`sphx_glr_auto_examples_linear_model_plot_logistic_path.py`.
100
tol tol: float, default=1e-4

Tolerance for stopping criteria.
1e-05
random_state random_state: int, RandomState instance, default=None

Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the
data. See :term:`Glossary <random_state>` for details.
22
solver solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs'

Algorithm to use in the optimization problem. Default is 'lbfgs'.
To choose a solver, you might want to consider the following aspects:

- 'lbfgs' is a good default solver because it works reasonably well for a wide
class of problems.
- For :term:`multiclass` problems (`n_classes >= 3`), all solvers except
'liblinear' minimize the full multinomial loss, 'liblinear' will raise an
error.
- 'newton-cholesky' is a good choice for
`n_samples` >> `n_features * n_classes`, especially with one-hot encoded
categorical features with rare categories. Be aware that the memory usage
of this solver has a quadratic dependency on `n_features * n_classes`
because it explicitly computes the full Hessian matrix.
- For small datasets, 'liblinear' is a good choice, whereas 'sag'
and 'saga' are faster for large ones;
- 'liblinear' can only handle binary classification by default. To apply a
one-versus-rest scheme for the multiclass setting one can wrap it with the
:class:`~sklearn.multiclass.OneVsRestClassifier`.

.. warning::
The choice of the algorithm depends on the penalty chosen (`l1_ratio=0`
for L2-penalty, `l1_ratio=1` for L1-penalty and `0 < l1_ratio < 1` for
Elastic-Net) and on (multinomial) multiclass support:

================= ======================== ======================
solver l1_ratio multinomial multiclass
================= ======================== ======================
'lbfgs' l1_ratio=0 yes
'liblinear' l1_ratio=1 or l1_ratio=0 no
'newton-cg' l1_ratio=0 yes
'newton-cholesky' l1_ratio=0 yes
'sag' l1_ratio=0 yes
'saga' 0<=l1_ratio<=1 yes
================= ======================== ======================

.. note::
'sag' and 'saga' fast convergence is only guaranteed on features
with approximately the same scale. You can preprocess the data with
a scaler from :mod:`sklearn.preprocessing`.

.. seealso::
Refer to the :ref:`User Guide <Logistic_regression>` for more
information regarding :class:`LogisticRegression` and more specifically the
:ref:`Table <logistic_regression_solvers>`
summarizing solver/penalty supports.

.. versionadded:: 0.17
Stochastic Average Gradient (SAG) descent solver. Multinomial support in
version 0.18.
.. versionadded:: 0.19
SAGA solver.
.. versionchanged:: 0.22
The default solver changed from 'liblinear' to 'lbfgs' in 0.22.
.. versionadded:: 1.2
newton-cholesky solver. Multinomial support in version 1.6.
'newton-cg'
penalty penalty: {'l1', 'l2', 'elasticnet', None}, default='l2'

Specify the norm of the penalty:

- `None`: no penalty is added;
- `'l2'`: add an L2 penalty term and it is the default choice;
- `'l1'`: add an L1 penalty term;
- `'elasticnet'`: both L1 and L2 penalty terms are added.

.. warning::
Some penalties may not work with some solvers. See the parameter
`solver` below, to know the compatibility between the penalty and
solver.

.. versionadded:: 0.19
l1 penalty with SAGA solver (allowing 'multinomial' + L1)

.. deprecated:: 1.8
`penalty` was deprecated in version 1.8 and will be removed in 1.10.
Use `l1_ratio` and `C` instead. `l1_ratio=0` for `penalty='l2'`,
`l1_ratio=1` for `penalty='l1'`, `l1_ratio` set to any float between 0 and 1
for `penalty='elasticnet'`, and `C=np.inf` for `penalty=None`.
'deprecated'
l1_ratio l1_ratio: float, default=0.0

The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Setting
`l1_ratio=1` gives a pure L1-penalty, setting `l1_ratio=0` a pure L2-penalty.
Any value between 0 and 1 gives an Elastic-Net penalty of the form
`l1_ratio * L1 + (1 - l1_ratio) * L2`.

.. warning::
Certain values of `l1_ratio`, i.e. some penalties, may not work with some
solvers. See the parameter `solver` below, to know the compatibility between
the penalty and solver.

.. versionchanged:: 1.8
Default value changed from None to 0.0.

.. deprecated:: 1.8
`None` is deprecated and will be removed in version 1.10. Always use
`l1_ratio` to specify the penalty type.
0.0
dual dual: bool, default=False

Dual (constrained) or primal (regularized, see also
:ref:`this equation <regularized-logistic-loss>`) formulation. Dual formulation
is only implemented for l2 penalty with liblinear solver. Prefer `dual=False`
when n_samples > n_features.
False
fit_intercept fit_intercept: bool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be
added to the decision function.
True
intercept_scaling intercept_scaling: float, default=1

Useful only when the solver `liblinear` is used
and `self.fit_intercept` is set to `True`. In this case, `x` becomes
`[x, self.intercept_scaling]`,
i.e. a "synthetic" feature with constant value equal to
`intercept_scaling` is appended to the instance vector.
The intercept becomes
``intercept_scaling * synthetic_feature_weight``.

.. note::
The synthetic feature weight is subject to L1 or L2
regularization as all other features.
To lessen the effect of regularization on synthetic feature weight
(and therefore on the intercept) `intercept_scaling` has to be increased.
1
class_weight class_weight: dict or 'balanced', default=None

Weights associated with classes in the form ``{class_label: weight}``.
If not given, all classes are supposed to have weight one.

The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as ``n_samples / (n_classes * np.bincount(y))``.

Note that these weights will be multiplied with sample_weight (passed
through the fit method) if sample_weight is specified.

.. versionadded:: 0.17
*class_weight='balanced'*
None
max_iter max_iter: int, default=100

Maximum number of iterations taken for the solvers to converge.
100
verbose verbose: int, default=0

For the liblinear and lbfgs solvers set verbose to any positive
number for verbosity.
0
warm_start warm_start: bool, default=False

When set to True, reuse the solution of the previous call to fit as
initialization, otherwise, just erase the previous solution.
Useless for liblinear solver. See :term:`the Glossary <warm_start>`.

.. versionadded:: 0.17
*warm_start* to support *lbfgs*, *newton-cg*, *sag*, *saga* solvers.
False
n_jobs n_jobs: int, default=None

Does not have any effect.

.. deprecated:: 1.8
`n_jobs` is deprecated in version 1.8 and will be removed in 1.10.
None

Coordinate Grid Generation & Probability Mapping

This block builds the mathematical testing grid used to map out the complete probability landscape:

  • np.meshgrid: Generates a dense $100 \times 100$ coordinate grid across the sepal feature space (Length: 38 cm, Width: 06 cm).
  • Xnew (np.c_): Flattens and couples the grid matrices into a matrix of 10,000 discrete 2D spatial coordinates.
  • predict_proba(Xnew): Evaluates the trained model across the entire grid, computing the continuous probabilities needed to render the decision contours and 3D surfaces.
In [25]:
x0, x1 = np.meshgrid(
    np.linspace(3,8,100).reshape(-1,1),
    np.linspace(0,6,100).reshape(-1,1)
)
Xnew = np.c_[x0.ravel(), x1.ravel()]
yPred = mylrvir.predict_proba(Xnew)
In [26]:
plt.figure(figsize=(10,4))
plt.plot(X[y==0,0], X[y==0,1],'bs',label='No Virg')
plt.plot(X[y==1,0], X[y==1,1],'g^',label='Virginica')
zz=yPred[:,1].reshape(x0.shape)
contour=plt.contour(x0,x1,zz)
plt.clabel(contour, inline=1,fontsize=15)
plt.xlabel("Sepal Length")
plt.ylabel("Sepal Width")
plt.legend()
plt.show()
No description has been provided for this image

This plot visualizes the continuous probability space generated by the trained bivariate model:

  • Sample Distribution: Blue squares represent non-virginica samples ($y=0$), and green triangles represent Iris virginica ($y=1$) mapped across Sepal Length and Sepal Width.
  • Probability Contours (plt.contour): The labeled contour lines map specific probability thresholds. They show how the model's prediction confidence transitions across the 2D space.
  • Decision Boundary: The contour line labeled 0.5 marks the exact geometric threshold. Any sample falling past this line is classified as Iris virginica, capturing the spatial trade-off between both sepal measurements.
In [27]:
fig, ax =plt.subplots(subplot_kw={"projection": "3d"})
surf = ax.plot_surface(x0,x1,zz, cmap='jet')
ax.scatter(iris.data[:,0:1], iris.data[:,1:2], y, 'or')
Out[27]:
<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x2a3c5a946e0>
No description has been provided for this image

This cell projects the bivariate logistic regression model into a 3D coordinate space to visualize the complete probability landscape:

  • Axis Dimensions: The horizontal axes represent Sepal Length ($x_1$) and Sepal Width ($x_2$), while the vertical axis ($Z$) tracks the continuous model probability $\sigma(z) \in [0, 1]$.
  • Probability Surface (plot_surface): The Sigmoid function is rendered as a 3D sheet using the jet colormap. It displays the non-linear S-curve transition dynamically stretched across the two-dimensional feature plane.
  • True Labels Spatial Scatter (ax.scatter): The red markers plot the actual samples at their exact spatial coordinates and true binary height ($z = 1$ for Iris virginica, $z = 0$ for others). This highlights how the optimized surface splits the space to fit the data points.

Modelo 5: Multiple features and muticlass classifier

Multi-Feature & Multi-Class Model Training

This cell configures and trains the final baseline model to handle all three species simultaneously within a two-dimensional feature space:

  • Bivariate Inputs (X): Slices index [:, 0:2] to utilize both Sepal Length and Sepal Width as the predictor variables.
  • Multi-Class Target (y): Retains the original multi-class target labels (0, 1, 2) without applying binarization, expanding the task from a single boundary to a three-class decision space.
  • LogisticRegression(C=100, solver='lbfgs'): Trains a multinomial classifier. The algorithm optimizes a distinct set of weights and biases for each target category, preparing the model to partition the 2D plane into three distinct classification zones.
In [28]:
X = iris.data[:,0:2]
y = iris.target
lrmc = LogisticRegression( 
    solver='lbfgs',
    C=100,
    random_state=22
)
lrmc.fit(X,y)
Out[28]:
LogisticRegression(C=100, random_state=22)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
C C: float, default=1.0

Inverse of regularization strength; must be a positive float.
Like in support vector machines, smaller values specify stronger
regularization. `C=np.inf` results in unpenalized logistic regression.
For a visual example on the effect of tuning the `C` parameter
with an L1 penalty, see:
:ref:`sphx_glr_auto_examples_linear_model_plot_logistic_path.py`.
100
random_state random_state: int, RandomState instance, default=None

Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the
data. See :term:`Glossary <random_state>` for details.
22
penalty penalty: {'l1', 'l2', 'elasticnet', None}, default='l2'

Specify the norm of the penalty:

- `None`: no penalty is added;
- `'l2'`: add an L2 penalty term and it is the default choice;
- `'l1'`: add an L1 penalty term;
- `'elasticnet'`: both L1 and L2 penalty terms are added.

.. warning::
Some penalties may not work with some solvers. See the parameter
`solver` below, to know the compatibility between the penalty and
solver.

.. versionadded:: 0.19
l1 penalty with SAGA solver (allowing 'multinomial' + L1)

.. deprecated:: 1.8
`penalty` was deprecated in version 1.8 and will be removed in 1.10.
Use `l1_ratio` and `C` instead. `l1_ratio=0` for `penalty='l2'`,
`l1_ratio=1` for `penalty='l1'`, `l1_ratio` set to any float between 0 and 1
for `penalty='elasticnet'`, and `C=np.inf` for `penalty=None`.
'deprecated'
l1_ratio l1_ratio: float, default=0.0

The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Setting
`l1_ratio=1` gives a pure L1-penalty, setting `l1_ratio=0` a pure L2-penalty.
Any value between 0 and 1 gives an Elastic-Net penalty of the form
`l1_ratio * L1 + (1 - l1_ratio) * L2`.

.. warning::
Certain values of `l1_ratio`, i.e. some penalties, may not work with some
solvers. See the parameter `solver` below, to know the compatibility between
the penalty and solver.

.. versionchanged:: 1.8
Default value changed from None to 0.0.

.. deprecated:: 1.8
`None` is deprecated and will be removed in version 1.10. Always use
`l1_ratio` to specify the penalty type.
0.0
dual dual: bool, default=False

Dual (constrained) or primal (regularized, see also
:ref:`this equation <regularized-logistic-loss>`) formulation. Dual formulation
is only implemented for l2 penalty with liblinear solver. Prefer `dual=False`
when n_samples > n_features.
False
tol tol: float, default=1e-4

Tolerance for stopping criteria.
0.0001
fit_intercept fit_intercept: bool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be
added to the decision function.
True
intercept_scaling intercept_scaling: float, default=1

Useful only when the solver `liblinear` is used
and `self.fit_intercept` is set to `True`. In this case, `x` becomes
`[x, self.intercept_scaling]`,
i.e. a "synthetic" feature with constant value equal to
`intercept_scaling` is appended to the instance vector.
The intercept becomes
``intercept_scaling * synthetic_feature_weight``.

.. note::
The synthetic feature weight is subject to L1 or L2
regularization as all other features.
To lessen the effect of regularization on synthetic feature weight
(and therefore on the intercept) `intercept_scaling` has to be increased.
1
class_weight class_weight: dict or 'balanced', default=None

Weights associated with classes in the form ``{class_label: weight}``.
If not given, all classes are supposed to have weight one.

The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as ``n_samples / (n_classes * np.bincount(y))``.

Note that these weights will be multiplied with sample_weight (passed
through the fit method) if sample_weight is specified.

.. versionadded:: 0.17
*class_weight='balanced'*
None
solver solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs'

Algorithm to use in the optimization problem. Default is 'lbfgs'.
To choose a solver, you might want to consider the following aspects:

- 'lbfgs' is a good default solver because it works reasonably well for a wide
class of problems.
- For :term:`multiclass` problems (`n_classes >= 3`), all solvers except
'liblinear' minimize the full multinomial loss, 'liblinear' will raise an
error.
- 'newton-cholesky' is a good choice for
`n_samples` >> `n_features * n_classes`, especially with one-hot encoded
categorical features with rare categories. Be aware that the memory usage
of this solver has a quadratic dependency on `n_features * n_classes`
because it explicitly computes the full Hessian matrix.
- For small datasets, 'liblinear' is a good choice, whereas 'sag'
and 'saga' are faster for large ones;
- 'liblinear' can only handle binary classification by default. To apply a
one-versus-rest scheme for the multiclass setting one can wrap it with the
:class:`~sklearn.multiclass.OneVsRestClassifier`.

.. warning::
The choice of the algorithm depends on the penalty chosen (`l1_ratio=0`
for L2-penalty, `l1_ratio=1` for L1-penalty and `0 < l1_ratio < 1` for
Elastic-Net) and on (multinomial) multiclass support:

================= ======================== ======================
solver l1_ratio multinomial multiclass
================= ======================== ======================
'lbfgs' l1_ratio=0 yes
'liblinear' l1_ratio=1 or l1_ratio=0 no
'newton-cg' l1_ratio=0 yes
'newton-cholesky' l1_ratio=0 yes
'sag' l1_ratio=0 yes
'saga' 0<=l1_ratio<=1 yes
================= ======================== ======================

.. note::
'sag' and 'saga' fast convergence is only guaranteed on features
with approximately the same scale. You can preprocess the data with
a scaler from :mod:`sklearn.preprocessing`.

.. seealso::
Refer to the :ref:`User Guide <Logistic_regression>` for more
information regarding :class:`LogisticRegression` and more specifically the
:ref:`Table <logistic_regression_solvers>`
summarizing solver/penalty supports.

.. versionadded:: 0.17
Stochastic Average Gradient (SAG) descent solver. Multinomial support in
version 0.18.
.. versionadded:: 0.19
SAGA solver.
.. versionchanged:: 0.22
The default solver changed from 'liblinear' to 'lbfgs' in 0.22.
.. versionadded:: 1.2
newton-cholesky solver. Multinomial support in version 1.6.
'lbfgs'
max_iter max_iter: int, default=100

Maximum number of iterations taken for the solvers to converge.
100
verbose verbose: int, default=0

For the liblinear and lbfgs solvers set verbose to any positive
number for verbosity.
0
warm_start warm_start: bool, default=False

When set to True, reuse the solution of the previous call to fit as
initialization, otherwise, just erase the previous solution.
Useless for liblinear solver. See :term:`the Glossary <warm_start>`.

.. versionadded:: 0.17
*warm_start* to support *lbfgs*, *newton-cg*, *sag*, *saga* solvers.
False
n_jobs n_jobs: int, default=None

Does not have any effect.

.. deprecated:: 1.8
`n_jobs` is deprecated in version 1.8 and will be removed in 1.10.
None

Multi-Class Grid Generation and Probability Evaluation

This cell sets up the coordinate testing matrix to evaluate the multi-class prediction behavior across the entire sepal feature space:

  • np.meshgrid: Constructs a dense $100 \times 100$ coordinate grid bounding the Sepal Length (38 cm) and Sepal Width (06 cm) ranges.
  • Xnew (np.c_): Flattens and pairs the grid elements into a matrix of 10,000 distinct 2D spatial coordinates.
  • lrmc.predict_proba(Xnew): Computes a three-column probability matrix for each point. This mapping determines the exact multi-class boundaries by evaluating the localized likelihood for Setosa, Versicolor, and Virginica simultaneously.
In [29]:
x0, x1 = np.meshgrid(
    np.linspace(3,8,100).reshape(-1,1),
    np.linspace(0,6,100).reshape(-1,1)
)
Xnew = np.c_[x0.ravel(), x1.ravel()]
yPred = lrmc.predict_proba(Xnew)
In [30]:
plt.figure(figsize=(10,4))
plt.plot(X[y==0,0], X[y==0,1],'.b',label='Setosa')
plt.plot(X[y==1,0], X[y==1,1],'+g',label='Versi')
plt.plot(X[y==2,0], X[y==2,1],'*m',label='Virgi')
zz=yPred[:,1].reshape(x0.shape)
contour=plt.contour(x0,x1,zz)
plt.clabel(contour, inline=1,fontsize=15)
plt.xlabel("Sepal Length")
plt.ylabel("Sepal Width")
plt.legend()
plt.show()
No description has been provided for this image

This plot maps the continuous probability distribution of the middle class within the multi-class decision space:

  • Three-Class Distribution: Displays all species simultaneously using distinct markers: blue dots for Setosa ($y=0$), green pluses for Versicolor ($y=1$), and magenta stars for Virginica ($y=2$).
  • Target Class Extraction (yPred[:, 1]): Slicing the second column of the probability matrix isolates and tracks the specific localized likelihood of a sample being Iris versicolor.
  • Localized Probability Ridge: Unlike the linear boundaries seen in binary classification, the multinomial model creates a bounded peak or "ridge" to isolate the middle class. The highest contour line (0.90) tightly encapsulates the core Versicolor cluster, dropping off systematically as the features move toward Setosa (left) or Virginica (right) territories.
In [31]:
yPred = lrmc.predict(Xnew)
plt.figure(figsize=(10,6))
plt.plot(X[y==0,0], X[y==0,1],'bs',label='Setosa')
plt.plot(X[y==1,0], X[y==1,1],'g^',label='Versi')
plt.plot(X[y==2,0], X[y==2,1],'*m',label='Virgi')
zz=yPred.reshape(x0.shape)
contour=plt.contourf(x0,x1,zz, cmap='jet', alpha=0.3)
plt.clabel(contour, inline=1,fontsize=15)
plt.xlabel("Sepal Length")
plt.ylabel("Sepal Width")
plt.legend()
plt.show()
No description has been provided for this image

This plot visualizes the ultimate classification boundaries by partitioning the entire 2D feature space into hard decision zones:

  • Hard Class Assignment (lrmc.predict): Converts continuous probabilities into discrete class verdicts (0, 1, or 2) by applying an argmax function (selecting the class with the highest probability for each point).
  • Filled Decision Regions (plt.contourf): Shades the coordinate plane into three distinct, solid zones using the jet colormap:
    • Blue Region: Absolute classification space for Iris setosa.
    • Green Region: Absolute classification space for Iris versicolor.
    • Red/Orange Region: Absolute classification space for Iris virginica.
  • Boundary Analysis: The sharp geometric intersections between the colored blocks define the definitive decision thresholds. This layout explicitly reveals how the linear multi-class model manages the regional trade-offs and handles the spatial overlap between the Versicolor and Virginica samples.
In [35]:
fig, ax =plt.subplots(subplot_kw={"projection": "3d"})
surf = ax.plot_surface(x0,x1,zz, cmap='jet')
ax.scatter(iris.data[:,0:1], iris.data[:,1:2], y, 'or')
Out[35]:
<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x2a3c02451d0>
No description has been provided for this image

This cell integrates the actual dataset samples into the 3D hard decision space to visually evaluate the multi-class model's accuracy:

  • Discrete Vertical Alignment ($Z$): Both the staircase surface and the scatter markers use the integer multi-class taxonomy ($0$ for Setosa, $1$ for Versicolor, and $2$ for Virginica) instead of continuous probabilities.
  • Stepped Surface (plot_surface): Represents the geometric boundaries computed by the model. Each level dictates the categorical verdict zone based on the combination of sepal features.
  • True Labels Scatter (ax.scatter): Plots the real flower samples at their actual feature coordinates and true species height. This allows immediate visual verification of performance: samples resting on their matching colored step are correctly classified, while those caught on the wrong tier highlight the exact instances of classification error caused by spatial overlap.
</html>