150 KiB
Lab Example 1 — Sampling variability and CLT¶
Objective: show that individual measurements vary, but sample means are more stable.
Students should:
- Generate a non-normal population.
- Take many random samples.
- Compute the mean of each sample.
- Plot the distribution of the sample means.
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(10)
population = np.random.exponential(scale=2.0, size=100000)
sample_size = 30
n_samples = 1000
sample_means = []
for i in range(n_samples):
sample = np.random.choice(population, size=sample_size, replace=True)
sample_means.append(sample.mean())
plt.hist(population, bins=40)
plt.title("Original population")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
plt.hist(sample_means, bins=40)
plt.title("Distribution of sample means")
plt.xlabel("Sample mean")
plt.ylabel("Frequency")
plt.show()
Even when the original data are not normally distributed, the distribution of sample means tends to become approximately normal when the sample size increases.
Lab Example 2 — Confidence interval for a mean¶
Objective: estimate a parameter with uncertainty.
Use the repeated measurement example:
import numpy as np
from scipy import stats
measurements = np.array([9.91, 10.03, 9.98, 10.05, 10.01, 9.96, 10.08, 9.94,
10.02, 9.99, 10.04, 9.97, 10.06, 9.95, 10.00])
n = len(measurements)
xbar = measurements.mean()
s = measurements.std(ddof=1)
confidence = 0.95
alpha = 1 - confidence
t_critical = stats.t.ppf(1 - alpha/2, df=n-1)
margin_error = t_critical * s / np.sqrt(n)
ci_lower = xbar - margin_error
ci_upper = xbar + margin_error
print("Mean:", xbar)
print("Standard deviation:", s)
print("95% confidence interval:", ci_lower, ci_upper)
Interpretation:
We do not report only the mean. We report the mean with an interval that reflects uncertainty.
Suggested question:
Does the confidence interval include $10\,\Omega$?
Lab Example 3 — Hypothesis test for a nominal value¶
Objective: test whether the measured component is statistically consistent with a reference value.
For the resistor example:
$H_0$: $\mu = 10$
$H_a$: $\mu \neq 10$
Python:
from scipy import stats
mu0 = 10.0
t_statistic, p_value = stats.ttest_1samp(measurements, popmean=mu0)
print("t statistic:", t_statistic)
print("p-value:", p_value)
If p-value < 0.05, reject H0. If p-value >= 0.05, do not reject H0.
A hypothesis test does not prove that the true resistance is exactly 10,\Omega. It only evaluates whether the observed difference is large compared with random variation.
Lab Example 4 — Comparing two measurement methods¶
Objective: connect with engineering experiments.
Two instruments measure the same nominal resistor. Determine whether their mean readings are different.
method_A = np.array([10.01, 9.98, 10.03, 9.97, 10.02, 10.00, 10.04, 9.99])
method_B = np.array([10.08, 10.05, 10.07, 10.03, 10.06, 10.09, 10.04, 10.08])
t_statistic, p_value = stats.ttest_ind(method_A, method_B, equal_var=False)
print("t statistic:", t_statistic)
print("p-value:", p_value)
plt.boxplot([method_A, method_B], labels=["Method A", "Method B"])
plt.ylabel("Resistance [ohms]")
plt.title("Comparison of measurement methods")
plt.show()
Assuming the test was:
$H_0$: $\mu_A = \mu_B$
$H_a$: $\mu_A \neq \mu_B$
and using:
$\alpha = 0.05$
then:
$p = 0.0001996 < 0.05$
Therefore, you reject $H_0$.
At the 5% significance level, the two-sample t-test provides sufficient evidence to conclude that the mean measurements obtained with Method A and Method B are different. Since the test statistic is negative, Method A has a lower sample mean than Method B.
This result may suggest that the two instruments or methods are not equivalent. The difference could be due to:
- calibration offset,
- systematic bias,
- different sensor accuracy,
- different measurement procedure,
- environmental or operator effects.
Lab Example 5 — Bootstrap confidence interval¶
Objective: show computational inference without relying only on formulas.
Use the same measurement dataset:
np.random.seed(10)
B = 5000
bootstrap_means = []
for i in range(B):
bootstrap_sample = np.random.choice(measurements, size=n, replace=True)
bootstrap_means.append(bootstrap_sample.mean())
bootstrap_means = np.array(bootstrap_means)
ci_bootstrap = np.percentile(bootstrap_means, [2.5, 97.5])
print("Bootstrap 95% CI:", ci_bootstrap)
plt.hist(bootstrap_means, bins=40)
plt.axvline(ci_bootstrap[0], linestyle="--")
plt.axvline(ci_bootstrap[1], linestyle="--")
plt.xlabel("Bootstrap mean")
plt.ylabel("Frequency")
plt.title("Bootstrap distribution of the mean")
plt.show()
#Lab Example 7 — Bootstrap confidence interval for the mean
#Goal: connect directly with your previous lab.
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
data = np.array([12.1, 11.8, 10.9, 13.0, 12.5, 11.7, 12.9, 13.2, 11.5, 12.4])
B = 5000
bootstrap_means = []
for _ in range(B):
sample_boot = np.random.choice(data, size=len(data), replace=True)
bootstrap_means.append(np.mean(sample_boot))
bootstrap_means = np.array(bootstrap_means)
ci_low, ci_high = np.percentile(bootstrap_means, [2.5, 97.5])
plt.hist(bootstrap_means, bins=30, edgecolor="black")
plt.axvline(ci_low, linestyle="--", label="2.5%")
plt.axvline(ci_high, linestyle="--", label="97.5%")
plt.axvline(data.mean(), linestyle="-", label="Original mean")
plt.xlabel("Bootstrap means")
plt.ylabel("Frequency")
plt.title("Bootstrap distribution of the mean")
plt.legend()
plt.show()
print("Original mean:", data.mean())
print("95% bootstrap CI:", ci_low, ci_high)
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
n = 40
beta_0 = 3
beta_1 = 2
sigma = 4
x = np.linspace(0, 10, n)
error = np.random.normal(0, sigma, n)
y = beta_0 + beta_1 * x + error
X = np.column_stack((np.ones(n), x))
beta_hat = np.linalg.inv(X.T @ X) @ X.T @ y
b0_hat, b1_hat = beta_hat
y_hat = b0_hat + b1_hat * x
plt.scatter(x, y, label="Data")
plt.plot(x, y_hat, label="OLS fitted line")
plt.xlabel("x")
plt.ylabel("y")
plt.title("OLS estimation")
plt.legend()
plt.show()
print("True beta_0:", beta_0)
print("Estimated beta_0:", b0_hat)
print("True beta_1:", beta_1)
print("Estimated beta_1:", b1_hat)