ENSAI - 2A - Programmation algorithmique en Python - 2025/2026

Linear regression 📈

Pure Python vs NumPy implementation

ENSAI - 2A - Programmation algorithmique en Python - 2025/2026

Problem setup

We want to fit a line:

Given data points , find that minimize:

ENSAI - 2A - Programmation algorithmique en Python - 2025/2026

Analytical solution

Optimal parameters are:

and

where and are the means of and .

ENSAI - 2A - Programmation algorithmique en Python - 2025/2026

Pure Python approach: simulate toy data

Set some parameters:

import random

n = 100
a_true, b_true = 2.0, 1.0
noise_std = 1.0

Create a simulated dataset:

X_train = [random.uniform(0, 10) for _ in range(n)]
y_train = [a_true * x + b_true + random.gauss(0, noise_std) for x in X_train]
ENSAI - 2A - Programmation algorithmique en Python - 2025/2026

Computing slope & intercept

Implement mean, variance, covariance manually:

def mean(values):
    return sum(values) / len(values)

def covariance(x, y, x_mean, y_mean):
    return sum((xi - x_mean)*(yi - y_mean) for xi, yi in zip(x, y)) / len(x)

def variance(values, mean_value):
    return sum((v - mean_value)**2 for v in values) / len(values)

x_mean, y_mean = mean(X_train), mean(y_train)
a = covariance(X_train, y_train, x_mean, y_mean) / variance(X_train, x_mean)
b = y_mean - a * x_mean
ENSAI - 2A - Programmation algorithmique en Python - 2025/2026

NumPy approach: generate toy data

Set some parameters

np.random.seed(42)
n_samples = 50

true_a, true_b = 2.5, 1.0

Create toy dataset:

x = np.random.uniform(0, 10, size=n_samples)
noise = np.random.normal(0, 2, size=n_samples)
y = true_a * x + true_b + noise
ENSAI - 2A - Programmation algorithmique en Python - 2025/2026

NumPy implementation

Compute the solution using NumPy API:

x_mean = np.mean(x)
y_mean = np.mean(y)

cov_xy = np.mean((x - x_mean) * (y - y_mean))
var_x = np.mean((x - x_mean) ** 2)

a = cov_xy / var_x
b = y_mean - a * x_mean

We only use the np.mean method, are there other ways?

ENSAI - 2A - Programmation algorithmique en Python - 2025/2026

Visualizing the fit

Generate some test inputs

x_test = np.linspace(0, 10, 100)
y_pred = a * x_test + b

Plot the result with matplotlib

import matplotlib.pyplot as plt

plt.scatter(x, y, label="Noisy data")
plt.plot(x_test, true_a * x_test + true_b, color="green", linestyle="--", label="True line")
plt.plot(x_test, y_pred, color="red", label="Fitted line")
plt.legend()
ENSAI - 2A - Programmation algorithmique en Python - 2025/2026

Too easy? Multidimensional linear regression 🔥

We now have vectors :

  • Still linear, but in higher dimensions
  • Same principle: minimize MSE
  • Solution uses matrix algebra:

Implement this in pure Python and/or NumPy!

ENSAI - 2A - Programmation algorithmique en Python - 2025/2026

Summary

  • Generated noisy toy data with a linear relationship
  • Implemented linear regression in two ways:
    • Pure Python (lists + loops)
    • NumPy (vectorized, analytical)
  • Derived slope & intercept from covariance and variance
  • Visualized fitted vs true line