Gradient descent
Recall that the gradient descent algorithm is given by:
where denotes the step size, and is the objective function to minimize. We first define the structure for the gradient descent algorithm. It only stores the learning rate as a f64
.
#![allow(unused)] fn main() { pub struct GradientDescent { pub learning_rate: f64, } }
We then implement a constructor. In this case, it simply consists of choosing the learning rate.
#![allow(unused)] fn main() { impl GradientDescent { /// Creates a new gradient descent optimizer. /// /// # Arguments /// - `learning_rate`: Step size used to update weights. pub fn new(learning_rate: f64) -> Self { Self { learning_rate } } } }
Next, we implement the step
method required by the Optimizer
trait:
#![allow(unused)] fn main() { impl Optimizer for GradientDescent { /// Applies the gradient descent step to each weight. /// /// Each weight is updated as: `w ← w - learning_rate * grad` fn step(&mut self, weights: &mut [f64], grads: &[f64]) { for (w, g) in weights.iter_mut().zip(grads.iter()) { *w -= self.learning_rate * g; } } } }
This function updates each entry of weights
by looping over the elements and applying the gradient descent update. The weight w
inside the loop must be dereferenced as it is passed as a mutable reference.
We use elementwise operations because Vec
doesn't provide built-in arithmetic methods. External crates such as ndarray
or nalgebra
could help write this more expressively.