Change point detection in time series seeks to identify times when the probability distribution of time series changes. It is widely applied in many areas, such as human-activity sensing and medical science. In the context of multivariate time series, this typically involves examining the joint distribution of high-dimensional data: If any one variable changes, the whole time series is assumed to have changed. However, in practical applications, we may be interested only in certain components of the time series, exploring abrupt changes in their distributions in the presence of other time series. Here, assuming an underlying structural causal model that governs the time-series data generation, we address this problem by proposing a two-stage non-parametric algorithm that first learns parts of the causal structure through constraint-based discovery methods. The algorithm then uses conditional relative Pearson divergence estimation to identify the change points. The conditional relative Pearson divergence quantifies the distribution disparity between consecutive segments in the time series, while the causal discovery method enables a focus on the causal mechanism, facilitating access to independent and identically distributed (IID) samples. Theoretically, the typical assumption of samples being IID in conventional change point detection methods can be relaxed based on the Causal Markov Condition. Through experiments on both synthetic and real-world datasets, we validate the correctness and utility of our approach.
翻译:时间序列中的变点检测旨在识别时间序列概率分布发生变化的时刻。该方法广泛应用于人类活动感知与医学科学等诸多领域。在多变量时间序列背景下,通常需要检验高维数据的联合分布:若任一变量发生变化,则假定整个时间序列已发生改变。然而在实际应用中,我们可能仅关注时间序列的特定分量,探究其在其他时间序列存在时其分布的突变情况。本文假设存在一个支配时间序列数据生成的底层结构因果模型,通过提出一种两阶段非参数算法来解决该问题:该算法首先通过基于约束的发现方法学习部分因果结构,随后利用条件相对皮尔逊散度估计来识别变点。条件相对皮尔逊散度可量化时间序列中连续片段间的分布差异,而因果发现方法能够聚焦于因果机制,从而有助于获取独立同分布样本。理论上,基于因果马尔可夫条件,可以放宽传统变点检测方法中样本需满足独立同分布的典型假设。通过对合成数据集和真实数据集的实验,我们验证了所提方法的正确性与实用性。