Optimizers

This chapter documents the small optimization module used in the project: a minimal runtime‑polymorphic interface Optimizer with two concrete implementations, Gradient Descent and Momentum. It is designed for clarity and easy swapping of algorithms in training loops.

Problem setting

Given parameters $w \in R^{d}$ and a loss $L (w)$ , an optimizer updates weights using the gradient $g_{t} = \nabla_{w} L (w_{t}) .$ Each algorithm defines an update rule $w_{t + 1} = Φ (w_{t}, g_{t}, θ)$ with hyper‑parameters $θ$ (e.g., learning rate, momentum).

API overview

Click here to view the full implementation: include/cppx/opt/optimizers.hpp. We break into down in the sequel of this section.

#pragma once
#include <algorithm>
#include <cmath>
#include <stdexcept>
#include <vector>

namespace cppx::opt {

class Optimizer {
  public:
    virtual ~Optimizer() = default;

    virtual void step(std::vector<double> &weights, const std::vector<double> &grads) = 0;
};

// --------------------------- GradientDescent ---------------------------
class GradientDescent final : public Optimizer {
  public:
    explicit GradientDescent(double learning_rate) : lr_(learning_rate) {}

    [[nodiscard]] double learning_rate() const noexcept { return lr_; }

    void step(std::vector<double> &weights, const std::vector<double> &grads) override {
        if (weights.size() != grads.size()) {
            throw std::invalid_argument("weights and grads size mismatch");
        }
        for (std::size_t i = 0; i < weights.size(); ++i) {
            weights[i] -= lr_ * grads[i];
        }
    }

  private:
    double lr_{};
};

// ------------------------------- Momentum ------------------------------
class Momentum final : public Optimizer {
  public:
    // struct Params {
    //     double learning_rate;
    //     double momentum;
    //     std::size_t dim;
    // };

    explicit Momentum(double learning_rate, double momentum, std::size_t dim)
        : lr_(learning_rate), mu_(momentum), v_(dim, 0.0) {}

    [[nodiscard]] double learning_rate() const noexcept { return lr_; }
    [[nodiscard]] double momentum() const noexcept { return mu_; }
    [[nodiscard]] const std::vector<double> &velocity() const noexcept { return v_; }

    void step(std::vector<double> &weights, const std::vector<double> &grads) override {
        if (weights.size() != grads.size()) {
            throw std::invalid_argument("weights and grads size mismatch");
        }
        if (v_.size() != weights.size()) {
            throw std::invalid_argument("velocity size mismatch");
        }

        for (std::size_t i = 0; i < weights.size(); ++i) {
            v_[i] = mu_ * v_[i] + lr_ * grads[i]; // v ← μ v + η g
            weights[i] -= v_[i];                  // w ← w − v
        }
    }

  private:
    double lr_{};
    double mu_{};
    std::vector<double> v_;
};

} // namespace cppx::opt

Design choices

A small virtual interface to enable swapping algorithms at runtime.
std::unique_ptr<Optimizer> for owning polymorphism; borrowing functions accept Optimizer&.
Exceptions (std::invalid_argument) signal size mismatches.

Gradient descent

Update rule $w_{t + 1} = w_{t} - η g_{t},$ with learning rate $η > 0$ .

Implementation

void GradientDescent::step(std::vector<double>& w,
                           const std::vector<double>& g) {
  if (w.size() != g.size()) throw std::invalid_argument("size mismatch");
  for (std::size_t i = 0; i < w.size(); ++i) {
    w[i] -= lr_ * g[i];
  }
}

Momentum-based gradient descent

Update rule $v_{t + 1} w_{t + 1} = μ v_{t} + η g_{t}, = w_{t} - v_{t + 1},$ with momentum $μ \in [0, 1)$ and learning rate $η > 0$ .

Implementation

Momentum::Momentum(double learning_rate, double momentum, std::size_t dim)
  : lr_(learning_rate), mu_(momentum), v_(dim, 0.0) {}

void Momentum::step(std::vector<double>& w, const std::vector<double>& g) {
  if (w.size() != g.size()) throw std::invalid_argument("size mismatch");
  if (v_.size() != w.size()) throw std::invalid_argument("velocity size mismatch");

  for (std::size_t i = 0; i < w.size(); ++i) {
    v_[i] = mu_ * v_[i] + lr_ * g[i];
    w[i] -= v_[i];
  }
}

Using the optimizers

Owning an optimizer (runtime polymorphism)

#include <memory>
#include "cppx/opt/optimizers.hpp"

using namespace cppx::opt;

std::vector<double> w(d, 0.0), g(d, 0.0);

// Choose an algorithm at runtime:
std::unique_ptr<Optimizer> opt =
    std::make_unique<Momentum>(/*lr=*/0.1, /*mu=*/0.9, /*dim=*/w.size());

for (int epoch = 0; epoch < 100; ++epoch) {
  // ... compute gradients into g ...
  opt->step(w, g);           // updates w in place
}

Borrowing an optimizer (no ownership transfer)

void train_one_epoch(Optimizer& opt,
                     std::vector<double>& w,
                     std::vector<double>& g) {
  // ... fill g ...
  opt.step(w, g);
}

API variations (optional)

If C++20 is available, std::span can make the interface container‑agnostic:

// virtual void step(std::span<double> w, std::span<const double> g) = 0;

Keyboard shortcuts

cppXplorers