Detecting and localizing change points in sequential data is of interest in many areas of application. Various notions of change points have been proposed, such as changes in mean, variance, or the linear regression coefficient. In this work, we consider settings in which a response variable $Y$ and a set of covariates $X=(X^1,\ldots,X^{d+1})$ are observed over time and aim to find changes in the causal mechanism generating $Y$ from $X$. More specifically, we assume $Y$ depends linearly on a subset of the covariates and aim to determine at what time points either the dependency on the subset or the subset itself changes. We call these time points causal change points (CCPs) and show that they form a subset of the commonly studied regression change points. We propose general methodology to both detect and localize CCPs. Although motivated by causality, we define CCPs without referencing an underlying causal model. The proposed definition of CCPs exploits a notion of invariance, which is a purely observational quantity but -- under additional assumptions -- has a causal meaning. For CCP localization, we propose a loss function that can be combined with existing multiple change point algorithms to localize multiple CCPs efficiently. We evaluate and illustrate our methods on simulated datasets.
翻译:在序列数据中检测与定位变点在众多应用领域具有重要意义。现有研究已提出多种变点概念,如均值变化、方差变化或线性回归系数的变化。本文考虑随时间观测的响应变量$Y$与协变量集合$X=(X^1,\ldots,X^{d+1})$的场景,旨在发现从$X$生成$Y$的因果机制中的变化。具体而言,我们假设$Y$线性依赖于协变量的子集,并致力于确定协变量子集的依赖关系或子集本身发生变化的时刻。我们将这些时刻定义为因果变点(CCP),并证明它们是经典回归变点的子集。我们提出了一套兼具检测与定位CCP功能的通用方法。尽管研究动机源于因果关系,但我们对CCP的定义并未依赖任何潜在的因果模型。该定义利用不变性概念——这一纯观测性度量在附加假设下具有因果含义。针对CCP定位,我们提出一种损失函数,可将其与现有变点检测算法结合,高效定位多个CCP。我们通过仿真数据集验证并展示了所提方法的效果。