Online changepoint detection aims to detect anomalies and changes in real-time in high-frequency data streams, sometimes with limited available computational resources. This is an important task that is rooted in many real-world applications, including and not limited to cybersecurity, medicine and astrophysics. While fast and efficient online algorithms have been recently introduced, these rely on parametric assumptions which are often violated in practical applications. Motivated by data streams from the telecommunications sector, we build a flexible nonparametric approach to detect a change in the distribution of a sequence. Our procedure, NP-FOCuS, builds a sequential likelihood ratio test for a change in a set of points of the empirical cumulative density function of our data. This is achieved by keeping track of the number of observations above or below those points. Thanks to functional pruning ideas, NP-FOCuS has a computational cost that is log-linear in the number of observations and is suitable for high-frequency data streams. In terms of detection power, NP-FOCuS is seen to outperform current nonparametric online changepoint techniques in a variety of settings. We demonstrate the utility of the procedure on both simulated and real data.
翻译:在线变点检测旨在高频数据流中实时检测异常与变化,且需应对有限计算资源的约束。该任务根植于众多实际应用场景,包括但不限于网络安全、医学和天体物理学。尽管已有快速高效的在线算法问世,但这些方法普遍依赖参数化假设,而实际应用中这些假设常被违反。受电信领域数据流的启发,我们构建了一种灵活的序贯非参数方法,用于检测序列分布的变化。所提方法NP-FOCuS通过构建序贯似然比检验,检测数据经验累积分布函数中多个分位点的变化。具体实现通过追踪各分位点上下观测值的数量来完成。借助函数剪枝思想,NP-FOCuS的计算复杂度与观测数量呈对数线性关系,适用于高频数据流。在检测效能方面,NP-FOCuS在多种场景下均优于现有非参数在线变点技术。我们通过模拟数据与真实数据验证了该方法的实用性。