Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Gradient descent

Recall that the gradient descent algorithm is given by:

where denotes the step size, and is the objective function to minimize. We first define the structure for the gradient descent algorithm. It only stores the learning rate as a f64.

#![allow(unused)]
fn main() {
pub struct GradientDescent {
    pub learning_rate: f64,
}
}

We then implement a constructor. In this case, it simply consists of choosing the learning rate.

#![allow(unused)]
fn main() {
impl GradientDescent {
    /// Creates a new gradient descent optimizer.
    ///
    /// # Arguments
    /// - `learning_rate`: Step size used to update weights.
    pub fn new(learning_rate: f64) -> Self {
        Self { learning_rate }
    }
}
}

Next, we implement the step method required by the Optimizer trait:

#![allow(unused)]
fn main() {
impl Optimizer for GradientDescent {
    /// Applies the gradient descent step to each weight.
    ///
    /// Each weight is updated as: `w ← w - learning_rate * grad`
    fn step(&mut self, weights: &mut [f64], grads: &[f64]) {
        for (w, g) in weights.iter_mut().zip(grads.iter()) {
            *w -= self.learning_rate * g;
        }
    }
}
}

This function updates each entry of weights by looping over the elements and applying the gradient descent update. The weight w inside the loop must be dereferenced as it is passed as a mutable reference.

We use elementwise operations because Vec doesn't provide built-in arithmetic methods. External crates such as ndarray or nalgebra could help write this more expressively.