Root Cause Analysis (RCA) aims at identifying the underlying causes of system faults by uncovering and analyzing the causal structure from complex systems. It has been widely used in many application domains. Reliable diagnostic conclusions are of great importance in mitigating system failures and financial losses. However, previous studies implicitly assume a full observation of the system, which neglect the effect of partial observation (i.e., missing nodes and latent malfunction). As a result, they fail in deriving reliable RCA results. In this paper, we unveil the issues of unobserved confounders and heterogeneity in partial observation and come up with a new problem of root cause analysis with partially observed data. To achieve this, we propose PORCA, a novel RCA framework which can explore reliable root causes under both unobserved confounders and unobserved heterogeneity. PORCA leverages magnified score-based causal discovery to efficiently optimize acyclic directed mixed graph under unobserved confounders. In addition, we also develop a heterogeneity-aware scheduling strategy to provide adaptive sample weights. Extensive experimental results on one synthetic and two real-world datasets demonstrate the effectiveness and superiority of the proposed framework.
翻译:根因分析(RCA)旨在通过揭示和分析复杂系统中的因果结构,识别系统故障的根本原因,已在众多应用领域得到广泛使用。可靠的诊断结论对于减轻系统故障和财务损失至关重要。然而,先前研究隐含地假设了对系统的完整观测,忽略了部分观测(即节点缺失和潜在故障)的影响,导致其无法得出可靠的RCA结果。本文揭示了部分观测中未观测混杂因子和异质性问题,并提出了基于部分观测数据的根因分析新问题。为此,我们提出PORCA——一种新颖的RCA框架,能够在未观测混杂因子和未观测异质性并存的情况下探索可靠的根本原因。PORCA利用基于放大分数的因果发现方法,在未观测混杂因子条件下高效优化有向无环混合图。此外,我们还开发了一种感知异质性的调度策略,以提供自适应样本权重。在一个合成数据集和两个真实数据集上的大量实验结果验证了所提框架的有效性和优越性。