Gradient descent
The GD
struct implements basic gradient descent with a fixed step size:
#![allow(unused)] fn main() { pub struct GD { step_size: f64, } }
It has a constructor:
#![allow(unused)] fn main() { impl GD { pub fn new(step_size: f64) -> Self { Self { step_size } } } }
And implements Optimizer
by subtracting the gradient scaled by the step size from the weights at each iteration.
#![allow(unused)] fn main() { impl Optimizer for GD { fn run( &self, weights: &mut Array1<f64>, grad_fn: impl Fn(&Array1<f64>) -> Array1<f64>, n_steps: usize, ) { for _ in 0..n_steps { let grads = grad_fn(weights); weights.zip_mut_with(&grads, |w, &g| { *w -= self.step_size * g; }); } } } }
Some notes:
-
At each iteration, we compute the gradient with
let grads = grad_fn(weights)
, which is fine but it reallocates a new vector at each call. If we wanted to optimize the gradient computation, we could pre-allocate a buffer outside the loop and pass a mutable reference into the gradient function to avoid repeated allocations. This would require to change the signature of thegrad_fn
. -
weights.zip_mut_with(&grads, |w, &g| {{ ... }})
: This is a mutable zip operation from thendarray
crate. It walks overweights
andgrads
, applying the closure to each pair. -
zip_mut_with is a method defined by the
Zip
trait, which is implemented for ArrayBase, and in particular for Array1. That’s why we can call it directly on weights
. -
In the closure statement we wrote:
|w, &g| *w -= self.step_size * g;
. Here,w
is a mutable reference to each weight element, so we dereference it using*w
to update its value. The&g
in the closure means we’re pattern-matching by reference to avoid cloning or copying eachf64
.