Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

In this book, we explore mid-price prediction in financial markets through the combined lens of statistical filtering and machine learning. The mid-price—halfway between the best bid and best ask—captures the evolving consensus of market participants and serves as a natural target for short-term price forecasting.

We begin by implementing a Kalman Filter as a statistical baseline for sequential state estimation. From there, we train and evaluate a range of machine learning models to assess how modern approaches compare with classical inference methods.

Our goals are two-fold:

  • Implement classical inference algorithms such as the Kalman Filter in C++ for efficiency and precision, with Python bindings for experimentation. Python bindings will be provided as well.
  • Compare these algorithms against machine learning models in terms of predictive accuracy, robustness, and computational performance.

The comparison will be carried out on classical mid-price forecasting datasets, including:

  • FI-2010: a publicly available benchmark dataset for mid-price forecasting for limit order book data
  • LOBster: a real limit order book dataset with millisecond-level resolution

By the end, we should have a practical understanding of how statistical filters and machine learning can be applied to mid-price prediction.

Model formulation

We consider a linear dynamical system with additive Gaussian noise:

where:

  • is the state vector at time step ,
  • is an optional control input,
  • is the measurement vector,
  • is the state transition matrix,
  • is the control-input matrix,
  • is the observation matrix,
  • is the process noise covariance,
  • is the measurement noise covariance.

Kalman filtering

The Kalman filter maintains the mean and covariance of the posterior distribution under the Gaussian assumption.

Prediction step

Given the previous posterior :

Here, are the predicted state mean and covariance.

Update step

With a new measurement :

  • Innovation (measurement residual):
  • Innovation covariance:
  • Kalman gain:
  • Updated mean and covariance:

The filter proceeds recursively for each time step.