Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Optimizers

This chapter documents the small optimization module used in the project: a minimal runtime‑polymorphic interface Optimizer with two concrete implementations, Gradient Descent and Momentum. It is designed for clarity and easy swapping of algorithms in training loops.

Problem setting

Given parameters and a loss , an optimizer updates weights using the gradient Each algorithm defines an update rule with hyper‑parameters (e.g., learning rate, momentum).

API overview

Click here to view the full implementation: include/cppx/opt/optimizers.hpp. We break into down in the sequel of this section.
#pragma once
#include <algorithm>
#include <cmath>
#include <stdexcept>
#include <vector>

namespace cppx::opt {

class Optimizer {
  public:
    virtual ~Optimizer() = default;

    virtual void step(std::vector<double> &weights, const std::vector<double> &grads) = 0;
};

// --------------------------- GradientDescent ---------------------------
class GradientDescent final : public Optimizer {
  public:
    explicit GradientDescent(double learning_rate) : lr_(learning_rate) {}

    [[nodiscard]] double learning_rate() const noexcept { return lr_; }

    void step(std::vector<double> &weights, const std::vector<double> &grads) override {
        if (weights.size() != grads.size()) {
            throw std::invalid_argument("weights and grads size mismatch");
        }
        for (std::size_t i = 0; i < weights.size(); ++i) {
            weights[i] -= lr_ * grads[i];
        }
    }

  private:
    double lr_{};
};

// ------------------------------- Momentum ------------------------------
class Momentum final : public Optimizer {
  public:
    // struct Params {
    //     double learning_rate;
    //     double momentum;
    //     std::size_t dim;
    // };

    explicit Momentum(double learning_rate, double momentum, std::size_t dim)
        : lr_(learning_rate), mu_(momentum), v_(dim, 0.0) {}

    [[nodiscard]] double learning_rate() const noexcept { return lr_; }
    [[nodiscard]] double momentum() const noexcept { return mu_; }
    [[nodiscard]] const std::vector<double> &velocity() const noexcept { return v_; }

    void step(std::vector<double> &weights, const std::vector<double> &grads) override {
        if (weights.size() != grads.size()) {
            throw std::invalid_argument("weights and grads size mismatch");
        }
        if (v_.size() != weights.size()) {
            throw std::invalid_argument("velocity size mismatch");
        }

        for (std::size_t i = 0; i < weights.size(); ++i) {
            v_[i] = mu_ * v_[i] + lr_ * grads[i]; // v ← μ v + η g
            weights[i] -= v_[i];                  // w ← w − v
        }
    }

  private:
    double lr_{};
    double mu_{};
    std::vector<double> v_;
};

} // namespace cppx::opt

Design choices

  • A small virtual interface to enable swapping algorithms at runtime.
  • std::unique_ptr<Optimizer> for owning polymorphism; borrowing functions accept Optimizer&.
  • Exceptions (std::invalid_argument) signal size mismatches.

Gradient descent

Update rule with learning rate .

Implementation

void GradientDescent::step(std::vector<double>& w,
                           const std::vector<double>& g) {
  if (w.size() != g.size()) throw std::invalid_argument("size mismatch");
  for (std::size_t i = 0; i < w.size(); ++i) {
    w[i] -= lr_ * g[i];
  }
}

Momentum-based gradient descent

Update rule with momentum and learning rate .

Implementation

Momentum::Momentum(double learning_rate, double momentum, std::size_t dim)
  : lr_(learning_rate), mu_(momentum), v_(dim, 0.0) {}

void Momentum::step(std::vector<double>& w, const std::vector<double>& g) {
  if (w.size() != g.size()) throw std::invalid_argument("size mismatch");
  if (v_.size() != w.size()) throw std::invalid_argument("velocity size mismatch");

  for (std::size_t i = 0; i < w.size(); ++i) {
    v_[i] = mu_ * v_[i] + lr_ * g[i];
    w[i] -= v_[i];
  }
}

Using the optimizers

Owning an optimizer (runtime polymorphism)

#include <memory>
#include "cppx/opt/optimizers.hpp"

using namespace cppx::opt;

std::vector<double> w(d, 0.0), g(d, 0.0);

// Choose an algorithm at runtime:
std::unique_ptr<Optimizer> opt =
    std::make_unique<Momentum>(/*lr=*/0.1, /*mu=*/0.9, /*dim=*/w.size());

for (int epoch = 0; epoch < 100; ++epoch) {
  // ... compute gradients into g ...
  opt->step(w, g);           // updates w in place
}

Borrowing an optimizer (no ownership transfer)

void train_one_epoch(Optimizer& opt,
                     std::vector<double>& w,
                     std::vector<double>& g) {
  // ... fill g ...
  opt.step(w, g);
}

API variations (optional)

If C++20 is available, std::span can make the interface container‑agnostic:

// virtual void step(std::span<double> w, std::span<const double> g) = 0;