Optimal parameters are:
and
where
Set some parameters:
import random
n = 100
a_true, b_true = 2.0, 1.0
noise_std = 1.0
Create a simulated dataset:
X_train = [random.uniform(0, 10) for _ in range(n)]
y_train = [a_true * x + b_true + random.gauss(0, noise_std) for x in X_train]
Implement mean, variance, covariance manually:
def mean(values):
return sum(values) / len(values)
def covariance(x, y, x_mean, y_mean):
return sum((xi - x_mean)*(yi - y_mean) for xi, yi in zip(x, y)) / len(x)
def variance(values, mean_value):
return sum((v - mean_value)**2 for v in values) / len(values)
x_mean, y_mean = mean(X_train), mean(y_train)
a = covariance(X_train, y_train, x_mean, y_mean) / variance(X_train, x_mean)
b = y_mean - a * x_mean
Set some parameters
np.random.seed(42)
n_samples = 50
true_a, true_b = 2.5, 1.0
Create toy dataset:
x = np.random.uniform(0, 10, size=n_samples)
noise = np.random.normal(0, 2, size=n_samples)
y = true_a * x + true_b + noise
Compute the solution using NumPy API:
x_mean = np.mean(x)
y_mean = np.mean(y)
cov_xy = np.mean((x - x_mean) * (y - y_mean))
var_x = np.mean((x - x_mean) ** 2)
a = cov_xy / var_x
b = y_mean - a * x_mean
We only use the np.mean
method, are there other ways?
Generate some test inputs
x_test = np.linspace(0, 10, 100)
y_pred = a * x_test + b
Plot the result with matplotlib
import matplotlib.pyplot as plt
plt.scatter(x, y, label="Noisy data")
plt.plot(x_test, true_a * x_test + true_b, color="green", linestyle="--", label="True line")
plt.plot(x_test, y_pred, color="red", label="Fitted line")
plt.legend()
We now have vectors
Implement this in pure Python and/or NumPy!