Gradient descent

The GD struct implements basic gradient descent with a fixed step size:

#![allow(unused)]
fn main() {
pub struct GD {
    step_size: f64,
}
}

It has a constructor:

#![allow(unused)]
fn main() {
impl GD {
    pub fn new(step_size: f64) -> Self {
        Self { step_size }
    }
}
}

And implements Optimizer by subtracting the gradient scaled by the step size from the weights at each iteration.

#![allow(unused)]
fn main() {
impl Optimizer for GD {
    fn run(
        &self,
        weights: &mut Array1<f64>,
        grad_fn: impl Fn(&Array1<f64>) -> Array1<f64>,
        n_steps: usize,
    ) {
        for _ in 0..n_steps {
            let grads = grad_fn(weights);
            weights.zip_mut_with(&grads, |w, &g| {
                *w -= self.step_size * g;
            });
        }
    }
}
}

Some notes:

At each iteration, we compute the gradient with let grads = grad_fn(weights), which is fine but it reallocates a new vector at each call. If we wanted to optimize the gradient computation, we could pre-allocate a buffer outside the loop and pass a mutable reference into the gradient function to avoid repeated allocations. This would require to change the signature of the grad_fn.
weights.zip_mut_with(&grads, |w, &g| {{ ... }}): This is a mutable zip operation from the ndarray crate. It walks over weights and grads, applying the closure to each pair.
zip_mut_with is a method defined by the Zip trait, which is implemented for ArrayBase, and in particular for Array1. That’s why we can call it directly on weights.
In the closure statement we wrote: |w, &g| *w -= self.step_size * g;. Here, w is a mutable reference to each weight element, so we dereference it using *w to update its value. The &g in the closure means we’re pattern-matching by reference to avoid cloning or copying each f64.

Keyboard shortcuts

Learn Rust with Applied Sciences

Gradient descent