Introduction
In this book, we explore mid-price prediction in financial markets through the combined lens of statistical filtering and machine learning. The mid-price—halfway between the best bid and best ask—captures the evolving consensus of market participants and serves as a natural target for short-term price forecasting.
We begin by implementing a Kalman Filter as a statistical baseline for sequential state estimation. From there, we train and evaluate a range of machine learning models to assess how modern approaches compare with classical inference methods.
Our goals are two-fold:
- Implement classical inference algorithms such as the Kalman Filter in C++ for efficiency and precision, with Python bindings for experimentation. Python bindings will be provided as well.
- Compare these algorithms against machine learning models in terms of predictive accuracy, robustness, and computational performance.
The comparison will be carried out on classical mid-price forecasting datasets, including:
- FI-2010: a publicly available benchmark dataset for mid-price forecasting for limit order book data
- LOBster: a real limit order book dataset with millisecond-level resolution
By the end, we should have a practical understanding of how statistical filters and machine learning can be applied to mid-price prediction.
Model formulation
We consider a linear dynamical system with additive Gaussian noise:
where:
- is the state vector at time step ,
- is an optional control input,
- is the measurement vector,
- is the state transition matrix,
- is the control-input matrix,
- is the observation matrix,
- is the process noise covariance,
- is the measurement noise covariance.
Kalman filtering
The Kalman filter maintains the mean and covariance of the posterior distribution under the Gaussian assumption.
Prediction step
Given the previous posterior :
Here, are the predicted state mean and covariance.
Update step
With a new measurement :
- Innovation (measurement residual):
- Innovation covariance:
- Kalman gain:
- Updated mean and covariance:
The filter proceeds recursively for each time step.