Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Gradient descent

The GD struct implements basic gradient descent with a fixed step size:

#![allow(unused)]
fn main() {
pub struct GD {
    step_size: f64,
}
}

It has a constructor:

#![allow(unused)]
fn main() {
impl GD {
    pub fn new(step_size: f64) -> Self {
        Self { step_size }
    }
}
}

And implements Optimizer by subtracting the gradient scaled by the step size from the weights at each iteration.

#![allow(unused)]
fn main() {
impl Optimizer for GD {
    fn run(
        &self,
        weights: &mut Array1<f64>,
        grad_fn: impl Fn(&Array1<f64>) -> Array1<f64>,
        n_steps: usize,
    ) {
        for _ in 0..n_steps {
            let grads = grad_fn(weights);
            weights.zip_mut_with(&grads, |w, &g| {
                *w -= self.step_size * g;
            });
        }
    }
}
}

Some notes:

  • At each iteration, we compute the gradient with let grads = grad_fn(weights), which is fine but it reallocates a new vector at each call. If we wanted to optimize the gradient computation, we could pre-allocate a buffer outside the loop and pass a mutable reference into the gradient function to avoid repeated allocations. This would require to change the signature of the grad_fn.

  • weights.zip_mut_with(&grads, |w, &g| {{ ... }}): This is a mutable zip operation from the ndarray crate. It walks over weights and grads, applying the closure to each pair.

  • zip_mut_with is a method defined by the Zip trait, which is implemented for ArrayBase, and in particular for Array1. That’s why we can call it directly on weights.

  • In the closure statement we wrote: |w, &g| *w -= self.step_size * g;. Here, w is a mutable reference to each weight element, so we dereference it using *w to update its value. The &g in the closure means we’re pattern-matching by reference to avoid cloning or copying each f64.