We study algorithms for online change-point detection (OCPD), where samples that are potentially heavy-tailed, are presented one at a time and a change in the underlying mean must be detected as early as possible. We present an algorithm based on clipped Stochastic Gradient Descent (SGD), that works even if we only assume that the second moment of the data generating process is bounded. We derive guarantees on worst-case, finite-sample false-positive rate (FPR) over the family of all distributions with bounded second moment. Thus, our method is the first OCPD algorithm that guarantees finite-sample FPR, even if the data is high dimensional and the underlying distributions are heavy-tailed. The technical contribution of our paper is to show that clipped-SGD can estimate the mean of a random vector and simultaneously provide confidence bounds at all confidence values. We combine this robust estimate with a union bound argument and construct a sequential change-point algorithm with finite-sample FPR guarantees. We show empirically that our algorithm works well in a variety of situations, whether the underlying data are heavy-tailed, light-tailed, high dimensional or discrete. No other algorithm achieves bounded FPR theoretically or empirically, over all settings we study simultaneously.
翻译:我们研究在线变点检测(OCPD)算法,其中可能具有重尾特性的样本被逐一呈现,并且必须尽早检测到潜在均值的突变。我们提出了一种基于裁剪随机梯度下降(SGD)的算法,该算法即使仅假设数据生成过程的二阶矩有界也能有效工作。我们推导了在二阶矩有界的所有分布族上,最坏情况下有限样本假阳性率(FPR)的保证。因此,我们的方法是首个保证有限样本FPR的OCPD算法,即使数据是高维的且底层分布是重尾的。本文的技术贡献在于证明了裁剪SGD既能估计随机向量的均值,又能同时为所有置信水平提供置信界。我们将这种稳健估计与联合界论证相结合,构建了一个具有有限样本FPR保证的序贯变点检测算法。实验表明,无论底层数据是重尾、轻尾、高维还是离散的,我们的算法在各种情况下均表现出色。在同时研究的所有设置中,没有其他算法能在理论上或经验上实现有界的FPR。