We study algorithms for online change-point detection (OCPD), where samples that are potentially heavy-tailed, are presented one at a time and a change in the underlying mean must be detected as early as possible. We present an algorithm based on clipped Stochastic Gradient Descent (SGD), that works even if we only assume that the second moment of the data generating process is bounded. We derive guarantees on worst-case, finite-sample false-positive rate (FPR) over the family of all distributions with bounded second moment. Thus, our method is the first OCPD algorithm that guarantees finite-sample FPR, even if the data is high dimensional and the underlying distributions are heavy-tailed. The technical contribution of our paper is to show that clipped-SGD can estimate the mean of a random vector and simultaneously provide confidence bounds at all confidence values. We combine this robust estimate with a union bound argument and construct a sequential change-point algorithm with finite-sample FPR guarantees. We show empirically that our algorithm works well in a variety of situations, whether the underlying data are heavy-tailed, light-tailed, high dimensional or discrete. No other algorithm achieves bounded FPR theoretically or empirically, over all settings we study simultaneously.
翻译:我们研究在线变化点检测(OCPD)算法,其中可能存在重尾特性的样本逐一呈现,必须尽早检测到潜在均值的变化。我们提出一种基于裁剪随机梯度下降(SGD)的算法,该算法仅在假设数据生成过程的二阶矩有界的情况下即可运行。我们在所有具有有界二阶矩的分布族上,推导了最坏情况下有限样本假阳性率(FPR)的保证。因此,我们的方法是首个即使数据高维且底层分布具有重尾特性也能保证有限样本FPR的OCPD算法。本文的技术贡献在于证明裁剪SGD既能估计随机向量的均值,又能同时提供所有置信水平下的置信界。我们将这种鲁棒估计与联合界论证相结合,构建了一个具有有限样本FPR保证的序贯变化点算法。实验表明,无论底层数据是重尾、轻尾、高维还是离散数据,我们的算法在各种情况下均表现良好。在所有研究设定下,尚无其他算法能在理论上或经验上同时实现有界FPR。