Optimizers
This chapter documents the small optimization module used in the project: a minimal runtime‑polymorphic interface Optimizer
with two concrete implementations, Gradient Descent and Momentum. It is designed for clarity and easy swapping of algorithms in training loops.
Problem setting
Given parameters and a loss , an optimizer updates weights using the gradient Each algorithm defines an update rule with hyper‑parameters (e.g., learning rate, momentum).
API overview
Click here to view the full implementation: include/cppx/opt/optimizers.hpp. We break into down in the sequel of this section.
#pragma once
#include <algorithm>
#include <cmath>
#include <stdexcept>
#include <vector>
namespace cppx::opt {
class Optimizer {
public:
virtual ~Optimizer() = default;
virtual void step(std::vector<double> &weights, const std::vector<double> &grads) = 0;
};
// --------------------------- GradientDescent ---------------------------
class GradientDescent final : public Optimizer {
public:
explicit GradientDescent(double learning_rate) : lr_(learning_rate) {}
[[nodiscard]] double learning_rate() const noexcept { return lr_; }
void step(std::vector<double> &weights, const std::vector<double> &grads) override {
if (weights.size() != grads.size()) {
throw std::invalid_argument("weights and grads size mismatch");
}
for (std::size_t i = 0; i < weights.size(); ++i) {
weights[i] -= lr_ * grads[i];
}
}
private:
double lr_{};
};
// ------------------------------- Momentum ------------------------------
class Momentum final : public Optimizer {
public:
// struct Params {
// double learning_rate;
// double momentum;
// std::size_t dim;
// };
explicit Momentum(double learning_rate, double momentum, std::size_t dim)
: lr_(learning_rate), mu_(momentum), v_(dim, 0.0) {}
[[nodiscard]] double learning_rate() const noexcept { return lr_; }
[[nodiscard]] double momentum() const noexcept { return mu_; }
[[nodiscard]] const std::vector<double> &velocity() const noexcept { return v_; }
void step(std::vector<double> &weights, const std::vector<double> &grads) override {
if (weights.size() != grads.size()) {
throw std::invalid_argument("weights and grads size mismatch");
}
if (v_.size() != weights.size()) {
throw std::invalid_argument("velocity size mismatch");
}
for (std::size_t i = 0; i < weights.size(); ++i) {
v_[i] = mu_ * v_[i] + lr_ * grads[i]; // v ← μ v + η g
weights[i] -= v_[i]; // w ← w − v
}
}
private:
double lr_{};
double mu_{};
std::vector<double> v_;
};
} // namespace cppx::opt
Design choices
- A small virtual interface to enable swapping algorithms at runtime.
std::unique_ptr<Optimizer>
for owning polymorphism; borrowing functions acceptOptimizer&
.- Exceptions (
std::invalid_argument
) signal size mismatches.
Gradient descent
Update rule with learning rate .
Implementation
void GradientDescent::step(std::vector<double>& w,
const std::vector<double>& g) {
if (w.size() != g.size()) throw std::invalid_argument("size mismatch");
for (std::size_t i = 0; i < w.size(); ++i) {
w[i] -= lr_ * g[i];
}
}
Momentum-based gradient descent
Update rule with momentum and learning rate .
Implementation
Momentum::Momentum(double learning_rate, double momentum, std::size_t dim)
: lr_(learning_rate), mu_(momentum), v_(dim, 0.0) {}
void Momentum::step(std::vector<double>& w, const std::vector<double>& g) {
if (w.size() != g.size()) throw std::invalid_argument("size mismatch");
if (v_.size() != w.size()) throw std::invalid_argument("velocity size mismatch");
for (std::size_t i = 0; i < w.size(); ++i) {
v_[i] = mu_ * v_[i] + lr_ * g[i];
w[i] -= v_[i];
}
}
Using the optimizers
Owning an optimizer (runtime polymorphism)
#include <memory>
#include "cppx/opt/optimizers.hpp"
using namespace cppx::opt;
std::vector<double> w(d, 0.0), g(d, 0.0);
// Choose an algorithm at runtime:
std::unique_ptr<Optimizer> opt =
std::make_unique<Momentum>(/*lr=*/0.1, /*mu=*/0.9, /*dim=*/w.size());
for (int epoch = 0; epoch < 100; ++epoch) {
// ... compute gradients into g ...
opt->step(w, g); // updates w in place
}
Borrowing an optimizer (no ownership transfer)
void train_one_epoch(Optimizer& opt,
std::vector<double>& w,
std::vector<double>& g) {
// ... fill g ...
opt.step(w, g);
}
API variations (optional)
If C++20 is available, std::span
can make the interface container‑agnostic:
// virtual void step(std::span<double> w, std::span<const double> g) = 0;