diff --git a/README.md b/README.md
index 28309cc..242d7c5 100644
--- a/README.md
+++ b/README.md
@@ -63,72 +63,6 @@ iris=datasets.load_iris()
print(iris.DESCR)
```
- .. _iris_dataset:
-
- Iris plants dataset
- --------------------
-
- **Data Set Characteristics:**
-
- :Number of Instances: 150 (50 in each of three classes)
- :Number of Attributes: 4 numeric, predictive attributes and the class
- :Attribute Information:
- - sepal length in cm
- - sepal width in cm
- - petal length in cm
- - petal width in cm
- - class:
- - Iris-Setosa
- - Iris-Versicolour
- - Iris-Virginica
-
- :Summary Statistics:
-
- ============== ==== ==== ======= ===== ====================
- Min Max Mean SD Class Correlation
- ============== ==== ==== ======= ===== ====================
- sepal length: 4.3 7.9 5.84 0.83 0.7826
- sepal width: 2.0 4.4 3.05 0.43 -0.4194
- petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
- petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)
- ============== ==== ==== ======= ===== ====================
-
- :Missing Attribute Values: None
- :Class Distribution: 33.3% for each of 3 classes.
- :Creator: R.A. Fisher
- :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
- :Date: July, 1988
-
- The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
- from Fisher's paper. Note that it's the same as in R, but not as in the UCI
- Machine Learning Repository, which has two wrong data points.
-
- This is perhaps the best known database to be found in the
- pattern recognition literature. Fisher's paper is a classic in the field and
- is referenced frequently to this day. (See Duda & Hart, for example.) The
- data set contains 3 classes of 50 instances each, where each class refers to a
- type of iris plant. One class is linearly separable from the other 2; the
- latter are NOT linearly separable from each other.
-
- .. dropdown:: References
-
- - Fisher, R.A. "The use of multiple measurements in taxonomic problems"
- Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
- Mathematical Statistics" (John Wiley, NY, 1950).
- - Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.
- (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.
- - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
- Structure and Classification Rule for Recognition in Partially Exposed
- Environments". IEEE Transactions on Pattern Analysis and Machine
- Intelligence, Vol. PAMI-2, No. 1, 67-71.
- - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions
- on Information Theory, May 1972, 431-433.
- - See also: 1988 MLC Proceedings, 54-64. Cheeseman et al"s AUTOCLASS II
- conceptual clustering system finds 3 classes in the data.
- - Many, many more ...
-
-
-
### Step 3: Exploratory Data Analysis & Target Inspection
Prior to model optimization, a visual and structural inspection evaluates the data distribution:
@@ -150,29 +84,10 @@ plt.show()
```
-
-
-
-
-
-
```python
iris.target
```
-
-
-
- array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
- 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
- 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
- 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
- 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
-
-
-
### Step 4: Theoretical Sigmoid Function & Decision Boundary
Generating a synthetic domain from -10 to 10 to plot the standalone mathematical Sigmoid function:
@@ -190,12 +105,6 @@ plt.legend(loc='upper left', fontsize=20)
plt.show()
```
-
-
-
-
-
-
## Model Training and Benchmark Evaluation
### Model 1: Iris-Setosa Classifier based on petal width
@@ -213,19 +122,6 @@ y = (iris.target == 0).astype(int)
y
```
-
-
-
- array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
- 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
- 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
-
-
-
#### Benchmark Model Initialization and Fitting
This cell instantiates and trains the baseline classification model using Scikit-Learn:
@@ -241,1262 +137,6 @@ mylr.fit(x,y)
```
-
-
-
LogisticRegression(random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
-
-
-
-
```python
Xnew = np.linspace(-1,3,100).reshape(-1,1)
yPred = mylr.predict_proba(Xnew)
@@ -1507,12 +147,6 @@ plt.plot(x,y,'*g')
plt.show()
```
-
-
-
-
-
-
This plot visualizes the trained model's Sigmoid prediction curve over the experimental dataset samples:
* **Sample Distribution (Green Stars):** Represents the real dataset. Small petal widths (0.1 - 0.6 cm) belong to *Iris setosa* ($y=1$), while larger widths (1.0 - 2.5 cm) belong to the other species ($y=0$).
@@ -1542,2557 +176,39 @@ mylr.fit(x,y)
```
+```python
+Xnew = np.linspace(0,8,100).reshape(-1,1)
+yPred = mylr.predict_proba(Xnew)
+#plt.plot(Xnew, yPred[:,0], label= 'No Iris')
+plt.plot(Xnew, yPred[:,1], label= 'Yes Iris')
+plt.legend()
+plt.plot(x,y,'*g')
+plt.axis([1.5, 5, -0.5, 1.5])
+plt.show()
+```
+This plot illustrates the performance of the second univariable model using **Petal Length**:
-
LogisticRegression(random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
-
-
-
-
-```python
-Xnew = np.linspace(0,8,100).reshape(-1,1)
-yPred = mylr.predict_proba(Xnew)
-#plt.plot(Xnew, yPred[:,0], label= 'No Iris')
-plt.plot(Xnew, yPred[:,1], label= 'Yes Iris')
-plt.legend()
-plt.plot(x,y,'*g')
-plt.axis([1.5, 5, -0.5, 1.5])
-plt.show()
-```
-
-
-
-
-
-
-
-This plot illustrates the performance of the second univariable model using **Petal Length**:
-
-* **Sample Distribution:** Samples with short petal lengths (1.0 - 2.0 cm) are correctly clustered as *Iris setosa* ($y=1$), while samples with larger lengths ($>3.0$ cm) map to $y=0$.
-* **Sigmoid Mapping:** The descending blue curve demonstrates that as petal length increases, the probability of the sample being *Iris setosa* drops sharply from 1.0 to 0.0.
-* **Decision Boundary:** The curve crosses the 0.5 probability threshold at approximately 2.5 cm, marking the exact inflection point that separates the target class from the rest of the dataset.
-
-### Model 3: Iris-Setosa Classifier based on Sepal length
-
-#### Feature Shift – Sepal Length Isolation
-
-The model evaluates a third morphological predictor independently:
-* **Feature Vector ($X$):** Slicing index `[:, 0:1]` isolates **Sepal Length** as the continuous independent variable.
-* **Target Continuity ($y$):** The objective remains focused on **Iris setosa** ($y=1$) to compare the separation power of sepal dimensions against the previous petal metrics.
-
-
-```python
-x = iris.data[:, 0:1]
-y = (iris.target == 0).astype(int)
-from sklearn.linear_model import LogisticRegression
-mylr = LogisticRegression(solver='lbfgs', random_state=42)
-mylr.fit(x,y)
-```
-
-
-
-
-
LogisticRegression(random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
+The model evaluates a third morphological predictor independently:
+* **Feature Vector ($X$):** Slicing index `[:, 0:1]` isolates **Sepal Length** as the continuous independent variable.
+* **Target Continuity ($y$):** The objective remains focused on **Iris setosa** ($y=1$) to compare the separation power of sepal dimensions against the previous petal metrics.
+```python
+x = iris.data[:, 0:1]
+y = (iris.target == 0).astype(int)
+from sklearn.linear_model import LogisticRegression
+mylr = LogisticRegression(solver='lbfgs', random_state=42)
+mylr.fit(x,y)
+```
```python
@@ -4102,1331 +218,63 @@ yPred = mylr.predict_proba(Xnew)
plt.plot(Xnew, yPred[:,1], label= 'Yes Iris')
plt.legend()
plt.plot(x,y,'*g')
-plt.axis([3.5, 7, -0.1, 1.1])
-plt.show()
-```
-
-
-
-
-
-
-
-This plot displays the performance of the third univariable model using **Sepal Length**:
-
-* **Sample Distribution:** Samples representing *Iris setosa* ($y=1$) are concentrated at shorter lengths, but show a much higher spatial overlap with non-setosa samples ($y=0$) compared to the previous petal features.
-* **Sigmoid Mapping:** The descending curve shows the probability dropping as sepal length increases. Due to this significant data overlap, the slope is less steep, indicating a more gradual and less aggressive probabilistic transition.
-* **Decision Boundary:** The inflection point at $\sigma = 0.5$ establishes the final threshold. This boundary carries more classification uncertainty because sepal dimensions are naturally less distinct between these species.
-
-### Model 4: Multiple features classifier
-
-#### Multi-Class Spatial Mapping (Sepal Features)
-
-This cell upgrades the initial exploratory plot by adding the ground-truth class labels to the 2D sepal feature space:
-
-* **Feature Interaction:** Maps Sepal Length (`sl`) against Sepal Width (`sw`) simultaneously to analyze their combined distribution.
-* **Class Color-Coding:** Differentiates the three original species using distinct markers: Green for *Setosa*, Red for *Versicolor*, and Blue for *Virginica*.
-* **Visual Separability Analysis:** Allows immediate observation of the data structure, showing that while *Setosa* forms a perfectly isolated cluster, *Versicolor* and *Virginica* exhibit significant spatial overlap, justifying the need for optimization models.
-
-
-```python
-import matplotlib.pyplot as plt
-sl = iris.data[:,0:1]
-sw = iris.data[:,1:2]
-tg = iris.target
-plt.plot(sl[tg==0,0], sw[tg==0,0],'.g' ,label='Set')
-plt.plot(sl[tg==1,0], sw[tg==1,0],'.r', label='Ver')
-plt.plot(sl[tg==2,0], sw[tg==2,0],'.b', label='Vir')
-plt.legend()
-plt.show()
-```
-
-
-
-
-
-
-
-#### Bivariate Model Training for Iris Virginica
-
-This cell configures and trains a multi-feature logistic regression model utilizing tuned optimization parameters:
-
-* **Bivariate Data Selection:** * **Features (`X`):** Slices index `[:, 0:2]` to combine **Sepal Length** and **Sepal Width** into a two-dimensional feature space.
- * **Target (`y`):** Shifts the positive class focus exclusively to **Iris virginica** (`iris.target == 2`).
-* **Hyperparameter Tuning (`mylrvir`):**
- * **`solver='newton-cg'`**: Uses the Newton-Conjugate Gradient method to compute accurate optimization paths.
- * **`C=100` & `tol=1e-5`**: Applies high cost (low regularization) to allow a tighter fit to the data, paired with a strict tolerance for precise convergence.
-* **`mylrvir.fit(X, y)`**: Trains the system to find the optimal weight vector $w = [w_1, w_2]$ and bias ($b$), establishing the multi-variable benchmark line.
-
-
-```python
-X = iris.data[:,0:2]
-y = (iris.target==2).astype(int)
-mylrvir = LogisticRegression(
- random_state=22,
- tol=1e-5,
- C=100,
- max_iter=100,
- solver='newton-cg'
-)
-mylrvir.fit(X,y)
-```
-
-
-
-
-
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
+* **Bivariate Data Selection:** * **Features (`X`):** Slices index `[:, 0:2]` to combine **Sepal Length** and **Sepal Width** into a two-dimensional feature space.
+ * **Target (`y`):** Shifts the positive class focus exclusively to **Iris virginica** (`iris.target == 2`).
+* **Hyperparameter Tuning (`mylrvir`):**
+ * **`solver='newton-cg'`**: Uses the Newton-Conjugate Gradient method to compute accurate optimization paths.
+ * **`C=100` & `tol=1e-5`**: Applies high cost (low regularization) to allow a tighter fit to the data, paired with a strict tolerance for precise convergence.
+* **`mylrvir.fit(X, y)`**: Trains the system to find the optimal weight vector $w = [w_1, w_2]$ and bias ($b$), establishing the multi-variable benchmark line.
+```python
+X = iris.data[:,0:2]
+y = (iris.target==2).astype(int)
+mylrvir = LogisticRegression(
+ random_state=22,
+ tol=1e-5,
+ C=100,
+ max_iter=100,
+ solver='newton-cg'
+)
+mylrvir.fit(X,y)
+```
#### Coordinate Grid Generation & Probability Mapping
@@ -5460,12 +308,6 @@ plt.legend()
plt.show()
```
-
-
-
-
-
-
This plot visualizes the continuous probability space generated by the trained bivariate model:
* **Sample Distribution:** Blue squares represent non-virginica samples ($y=0$), and green triangles represent *Iris virginica* ($y=1$) mapped across Sepal Length and Sepal Width.
@@ -5479,19 +321,6 @@ surf = ax.plot_surface(x0,x1,zz, cmap='jet')
ax.scatter(iris.data[:,0:1], iris.data[:,1:2], y, 'or')
```
-
-
-
-
-
-
-
-
-
-
-
-
-
This cell projects the bivariate logistic regression model into a 3D coordinate space to visualize the complete probability landscape:
* **Axis Dimensions:** The horizontal axes represent **Sepal Length** ($x_1$) and **Sepal Width** ($x_2$), while the vertical axis ($Z$) tracks the continuous model probability $\sigma(z) \in [0, 1]$.
@@ -5520,1264 +349,6 @@ lrmc = LogisticRegression(
lrmc.fit(X,y)
```
-
-
-
-
LogisticRegression(C=100, random_state=22)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
-
-
-
#### Multi-Class Grid Generation and Probability Evaluation
This cell sets up the coordinate testing matrix to evaluate the multi-class prediction behavior across the entire sepal feature space:
@@ -6811,12 +382,6 @@ plt.legend()
plt.show()
```
-
-
-
-
-
-
This plot maps the continuous probability distribution of the middle class within the multi-class decision space:
* **Three-Class Distribution:** Displays all species simultaneously using distinct markers: blue dots for *Setosa* ($y=0$), green pluses for *Versicolor* ($y=1$), and magenta stars for *Virginica* ($y=2$).
@@ -6839,12 +404,6 @@ plt.legend()
plt.show()
```
-
-
-
-
-
-
This plot visualizes the ultimate classification boundaries by partitioning the entire 2D feature space into hard decision zones:
* **Hard Class Assignment (`lrmc.predict`):** Converts continuous probabilities into discrete class verdicts (`0`, `1`, or `2`) by applying an *argmax* function (selecting the class with the highest probability for each point).
@@ -6861,19 +420,6 @@ surf = ax.plot_surface(x0,x1,zz, cmap='jet')
ax.scatter(iris.data[:,0:1], iris.data[:,1:2], y, 'or')
```
-
-
-
-
-
-
-
-
-
-
-
-
-
This cell integrates the actual dataset samples into the 3D hard decision space to visually evaluate the multi-class model's accuracy:
* **Discrete Vertical Alignment ($Z$):** Both the staircase surface and the scatter markers use the integer multi-class taxonomy ($0$ for *Setosa*, $1$ for *Versicolor*, and $2$ for *Virginica*) instead of continuous probabilities.