Anticipative Bayesian classification for data streams with verification latency

J Appl Stat. 2024 Feb 21;51(14):2812-2831. doi: 10.1080/02664763.2024.2319222. eCollection 2024.

Abstract

Most of the existing adaptive classification algorithms in non-stationary data streams require recent labelled data for their updates. Such recent labels are often missing. For stream classification under verification latency only few approaches exist. Most of them assume clustered data or homogeneous drift in all features, which limits their applicability. We address this by proposing Anticipative Bayesian stream Classifier (ABClass), an approach that is capable of integrating and automatically selecting from different components. In its Bayesian classification framework, ABClass combines density estimation techniques, extended to extrapolate drift patterns over time, with unsupervised parameter tuning and unsupervised model selection. ABClass allows for multivariate density estimation and extrapolation techniques. In this work, we assume conditional independence between features given the class label for modelling feature-specific drift patterns. ABClass is generative and can also be used for explaining and visualising concept drift patterns. It is generic, making it easy to include further types of drift models, both for the class-conditional feature distribution and for the class prior distribution. The experimental evaluation on several real-world data streams shows its competitiveness compared to other state-of-the-art approaches. ABClass is in most cases ten- to hundred-times faster than its competitors, both for model fitting and for prediction.

Keywords: Data streams; concept drift; label delay; non-stationary environments; temporal transfer learning; verification latency.

Grants and funding

This work was supported by Oesterreichische Nationalbank [17028].