Root cause analysis of anomalies aims to identify those features that cause the deviation from the normal process. Existing methods ignore, however, that anomalies can arise through two fundamentally different processes: measurement errors, where data was generated normally but one or more values were recorded incorrectly, and mechanism shifts, where the causal process generating the data changed. While measurement errors can often be safely corrected, mechanistic anomalies require careful consideration. We define a causal model that explicitly captures both types by treating outliers as latent interventions on latent ("true") and observed ("measured") variables. We show that they are identifiable, and propose a maximum likelihood estimation approach to put this to practice. Experiments show that our method matches state-of-the-art performance in root cause localization, while it additionally enables accurate classification of anomaly types, and remains robust even when the causal DAG is unknown.
翻译:异常根因分析旨在识别导致过程偏离正常状态的特征变量。然而,现有方法忽略了异常可能源于两种本质不同的过程:测量误差(数据生成过程正常但部分数值记录错误)与机制偏移(生成数据的因果过程发生改变)。测量误差通常可安全修正,而机制异常则需要审慎处理。我们构建了一个显式刻画两类异常的因果模型,将异常值视为对隐变量("真实值")和观测变量("测量值"的潜在干预。我们证明了两类异常的可识别性,并提出基于最大似然估计的实践方法。实验表明,本方法在根因定位方面达到当前最优性能,同时能准确分类异常类型,即使在因果DAG未知时仍保持稳健性。