Gradient descent
As an exercise, we now implement the gradient descent algorithm to optimize the Ridge regression loss. The gradient has a closed-form expression and can be efficiently computed. We implement it in two different ways as we did for the loss function.
Gradient descent implementation
Gradient descent iteratively updates the parameter β
using the gradient of the loss function:
Where η
is the learning rate, and ∇βL(β)
is the gradient of the loss.
We allow flexible experimentation by passing the gradient function as parameters:
#![allow(unused)] fn main() { pub fn gradient_descent( grad_fn: impl Fn(&[f64], &[f64], f64, f64) -> f64, x: &[f64], y: &[f64], lambda2: f64, lr: f64, n_iters: usize, init_beta: f64, ) -> f64 { let mut beta = init_beta; for _ in 0..n_iters { let grad = grad_fn(x, y, beta, lambda2); beta -= lr * grad; } beta } }
This version is generic, letting us plug in any valid grad_fn
.
Gradient function: naive implementation
This version breaks the computation into two separate steps:
- Compute the residuals
- Compute the dot product between the residuals and the inputs:
- Then assemble the get the gradient value
We first start by implementing our own dot
function by relying on iterators, map chaining, and summing the results.
#![allow(unused)] fn main() { pub fn dot(a: &[f64], b: &[f64]) -> f64 { assert_eq!(a.len(), b.len(), "Input vectors must have the same length"); a.iter().zip(b.iter()).map(|(xi, yi)| xi * yi).sum() } }
Our first implementation takes the following form:
#![allow(unused)] fn main() { pub fn grad_loss_function_naive(x: &[f64], y: &[f64], beta: f64, lambda2: f64) -> f64 { assert_eq!(x.len(), y.len(), "x and y must have the same length"); let n: usize = x.len(); let residuals: Vec<f64> = x .iter() .zip(y.iter()) .map(|(xi, yi)| yi - beta * xi) .collect(); let residuals_dot_x = dot(&residuals, x); -2.0 * residuals_dot_x / (n as f64) + 2.0 * lambda2 * beta } }
Gradient function: inlined iterator-based implementation
In this version, we fuse the residual and gradient computation into a single iterator chain. This avoids intermediate memory allocations and takes full advantage of Rust’s zero-cost abstraction model.
#![allow(unused)] fn main() { pub fn grad_loss_function_inline(x: &[f64], y: &[f64], beta: f64, lambda2: f64) -> f64 { assert_eq!(x.len(), y.len(), "x and y must have the same length"); let n: usize = x.len(); let grad_mse: f64 = x .iter() .zip(y.iter()) .map(|(xi, yi)| 2.0 * (yi - beta * xi) * xi) .sum::<f64>() / (n as f64); -grad_mse + 2.0 * lambda2 * beta } }
Key differences:
- The naive version allocates a temporary vectosr for the residuals and the dot product.
- The inlined version is more idiomatic Rust: it avoids allocation and achieves better performance through iterator fusion.