Causal discovery methods aim to determine the causal direction between variables using observational data. Functional causal discovery methods, such as those based on the Linear Non-Gaussian Acyclic Model (LiNGAM), rely on structural and distributional assumptions to infer the causal direction. However, approaches for assessing causal discovery methods' performance as a function of sample size or the impact of assumption violations, inevitable in real-world scenarios, are lacking. To address this need, we propose Causal Direction Detection Rate (CDDR) diagnostic that evaluates whether and to what extent the interaction between assumption violations and sample size affects the ability to identify the hypothesized causal direction. Given a bivariate dataset of size N on a pair of variables, X and Y, CDDR diagnostic is the plotted comparison of the probability of each causal discovery outcome (e.g. X causes Y, Y causes X, or inconclusive) as a function of sample size less than N. We fully develop CDDR diagnostic in a bivariate case and demonstrate its use for two methods, LiNGAM and our new test-based causal discovery approach. We find CDDR diagnostic for the test-based approach to be more informative since it uses a richer set of causal discovery outcomes. Under certain assumptions, we prove that the probability estimates of detecting each possible causal discovery outcome are consistent and asymptotically normal. Through simulations, we study CDDR diagnostic's behavior when linearity and non-Gaussianity assumptions are violated. Additionally, we illustrate CDDR diagnostic on four real datasets, including three for which the causal direction is known.
翻译:因果发现方法旨在利用观测数据确定变量间的因果方向。功能因果发现方法,例如基于线性非高斯无环模型(LiNGAM)的方法,依赖于结构和分布假设来推断因果方向。然而,目前缺乏评估因果发现方法性能随样本量变化的方法,也缺乏评估假设违反(在现实场景中不可避免)影响的方法。为满足这一需求,我们提出因果方向检测率(CDDR)诊断方法,用于评估假设违反与样本量之间的相互作用是否以及在何种程度上影响识别假设因果方向的能力。给定一对变量X和Y的规模为N的二元数据集,CDDR诊断是通过绘制小于N的样本量函数下,每种因果发现结果(例如X导致Y、Y导致X或无法确定)的概率比较图来实现的。我们完整开发了二元情况下的CDDR诊断,并演示了其在两种方法中的应用:LiNGAM和我们新的基于检验的因果发现方法。我们发现基于检验方法的CDDR诊断信息量更大,因为它使用了更丰富的因果发现结果集合。在某些假设下,我们证明了检测每种可能因果发现结果的概率估计具有一致性和渐近正态性。通过模拟研究,我们探讨了线性和非高斯性假设被违反时CDDR诊断的行为特征。此外,我们在四个真实数据集上展示了CDDR诊断的应用,其中三个数据集的因果方向已知。