You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

150 KiB

Raw Permalink Blame History

None <html lang="en"> <head> </head>

Lab Example 1 — Sampling variability and CLT¶

Objective: show that individual measurements vary, but sample means are more stable.

Students should:

Generate a non-normal population.
Take many random samples.
Compute the mean of each sample.
Plot the distribution of the sample means.

In [2]:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(10)

population = np.random.exponential(scale=2.0, size=100000)

sample_size = 30
n_samples = 1000

sample_means = []

for i in range(n_samples):
    sample = np.random.choice(population, size=sample_size, replace=True)
    sample_means.append(sample.mean())

plt.hist(population, bins=40)
plt.title("Original population")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

plt.hist(sample_means, bins=40)
plt.title("Distribution of sample means")
plt.xlabel("Sample mean")
plt.ylabel("Frequency")
plt.show()

No description has been provided for this image

Even when the original data are not normally distributed, the distribution of sample means tends to become approximately normal when the sample size increases.

Lab Example 2 — Confidence interval for a mean¶

Objective: estimate a parameter with uncertainty.

Use the repeated measurement example:

In [4]:

import numpy as np
from scipy import stats

measurements = np.array([9.91, 10.03, 9.98, 10.05, 10.01, 9.96, 10.08, 9.94,
                         10.02, 9.99, 10.04, 9.97, 10.06, 9.95, 10.00])

n = len(measurements)
xbar = measurements.mean()
s = measurements.std(ddof=1)

confidence = 0.95
alpha = 1 - confidence

t_critical = stats.t.ppf(1 - alpha/2, df=n-1)
margin_error = t_critical * s / np.sqrt(n)

ci_lower = xbar - margin_error
ci_upper = xbar + margin_error

print("Mean:", xbar)
print("Standard deviation:", s)
print("95% confidence interval:", ci_lower, ci_upper)

Mean: 9.999333333333333
Standard deviation: 0.04832430129073875
95% confidence interval: 9.972572227269502 10.026094439397163

Interpretation:

We do not report only the mean. We report the mean with an interval that reflects uncertainty.

Lab Example 3 — Hypothesis test for a nominal value¶

Objective: test whether the measured component is statistically consistent with a reference value.

For the resistor example:

$H_0$: $\mu = 10$

$H_a$: $\mu \neq 10$

Python:

In [7]:

from scipy import stats

mu0 = 10.0

t_statistic, p_value = stats.ttest_1samp(measurements, popmean=mu0)

print("t statistic:", t_statistic)
print("p-value:", p_value)

t statistic: -0.05343044448672866
p-value: 0.9581438956150026

If p-value < 0.05, reject H0. If p-value >= 0.05, do not reject H0.

A hypothesis test does not prove that the true resistance is exactly 10,\Omega. It only evaluates whether the observed difference is large compared with random variation.

Lab Example 4 — Comparing two measurement methods¶

Objective: connect with engineering experiments.

Two instruments measure the same nominal resistor. Determine whether their mean readings are different.

In [9]:

method_A = np.array([10.01, 9.98, 10.03, 9.97, 10.02, 10.00, 10.04, 9.99])
method_B = np.array([10.08, 10.05, 10.07, 10.03, 10.06, 10.09, 10.04, 10.08])

In [11]:

t_statistic, p_value = stats.ttest_ind(method_A, method_B, equal_var=False)

print("t statistic:", t_statistic)
print("p-value:", p_value)

t statistic: -5.019011475427952
p-value: 0.0001995796758233766

In [12]:

plt.boxplot([method_A, method_B], labels=["Method A", "Method B"])
plt.ylabel("Resistance [ohms]")
plt.title("Comparison of measurement methods")
plt.show()

/var/folders/0c/2g7zyqyd3lj21c3yr8x3168m0000gn/T/ipykernel_3163/4104677222.py:1: MatplotlibDeprecationWarning: The 'labels' parameter of boxplot() has been renamed 'tick_labels' since Matplotlib 3.9; support for the old name will be dropped in 3.11.
  plt.boxplot([method_A, method_B], labels=["Method A", "Method B"])

Assuming the test was:

$H_0$: $\mu_A = \mu_B$

$H_a$: $\mu_A \neq \mu_B$

and using:

$\alpha = 0.05$

then:

$p = 0.0001996 < 0.05$

Therefore, you reject $H_0$.

At the 5% significance level, the two-sample t-test provides sufficient evidence to conclude that the mean measurements obtained with Method A and Method B are different. Since the test statistic is negative, Method A has a lower sample mean than Method B.

This result may suggest that the two instruments or methods are not equivalent. The difference could be due to:

calibration offset,
systematic bias,
different sensor accuracy,
different measurement procedure,
environmental or operator effects.

Lab Example 5 — Bootstrap confidence interval¶

Objective: show computational inference without relying only on formulas.

Use the same measurement dataset:

In [14]:

np.random.seed(10)

B = 5000
bootstrap_means = []

for i in range(B):
    bootstrap_sample = np.random.choice(measurements, size=n, replace=True)
    bootstrap_means.append(bootstrap_sample.mean())

bootstrap_means = np.array(bootstrap_means)

ci_bootstrap = np.percentile(bootstrap_means, [2.5, 97.5])

print("Bootstrap 95% CI:", ci_bootstrap)

Bootstrap 95% CI: [ 9.976      10.02268333]

In [15]:

plt.hist(bootstrap_means, bins=40)
plt.axvline(ci_bootstrap[0], linestyle="--")
plt.axvline(ci_bootstrap[1], linestyle="--")
plt.xlabel("Bootstrap mean")
plt.ylabel("Frequency")
plt.title("Bootstrap distribution of the mean")
plt.show()

In [17]:

#Lab Example 7 — Bootstrap confidence interval for the mean

#Goal: connect directly with your previous lab.

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

data = np.array([12.1, 11.8, 10.9, 13.0, 12.5, 11.7, 12.9, 13.2, 11.5, 12.4])

B = 5000
bootstrap_means = []

for _ in range(B):
    sample_boot = np.random.choice(data, size=len(data), replace=True)
    bootstrap_means.append(np.mean(sample_boot))

bootstrap_means = np.array(bootstrap_means)

ci_low, ci_high = np.percentile(bootstrap_means, [2.5, 97.5])

plt.hist(bootstrap_means, bins=30, edgecolor="black")
plt.axvline(ci_low, linestyle="--", label="2.5%")
plt.axvline(ci_high, linestyle="--", label="97.5%")
plt.axvline(data.mean(), linestyle="-", label="Original mean")
plt.xlabel("Bootstrap means")
plt.ylabel("Frequency")
plt.title("Bootstrap distribution of the mean")
plt.legend()
plt.show()

print("Original mean:", data.mean())
print("95% bootstrap CI:", ci_low, ci_high)

Original mean: 12.2
95% bootstrap CI: 11.77 12.620000000000001

In [18]:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

n = 40
beta_0 = 3
beta_1 = 2
sigma = 4

x = np.linspace(0, 10, n)
error = np.random.normal(0, sigma, n)
y = beta_0 + beta_1 * x + error

X = np.column_stack((np.ones(n), x))

beta_hat = np.linalg.inv(X.T @ X) @ X.T @ y

b0_hat, b1_hat = beta_hat

y_hat = b0_hat + b1_hat * x

plt.scatter(x, y, label="Data")
plt.plot(x, y_hat, label="OLS fitted line")
plt.xlabel("x")
plt.ylabel("y")
plt.title("OLS estimation")
plt.legend()
plt.show()

print("True beta_0:", beta_0)
print("Estimated beta_0:", b0_hat)
print("True beta_1:", beta_1)
print("Estimated beta_1:", b1_hat)

True beta_0: 3
Estimated beta_0: 3.717585051473372
True beta_1: 2
Estimated beta_1: 1.6815735189709737

</html>

150 KiB Raw Permalink Blame History