Causal discovery methods aim to determine the causal direction between variables using observational data. Functional causal discovery methods, such as those based on the Linear Non-Gaussian Acyclic Model (LiNGAM), rely on structural and distributional assumptions to infer the causal direction. However, approaches for assessing causal discovery methods' performance as a function of sample size or the impact of assumption violations, inevitable in real-world scenarios, are lacking. To address this need, we propose Causal Direction Detection Rate (CDDR) diagnostic that evaluates whether and to what extent the interaction between assumption violations and sample size affects the ability to identify the hypothesized causal direction. Given a bivariate dataset of size N on a pair of variables, X and Y, CDDR diagnostic is the plotted comparison of the probability of each causal discovery outcome (e.g. X causes Y, Y causes X, or inconclusive) as a function of sample size less than N. We fully develop CDDR diagnostic in a bivariate case and demonstrate its use for two methods, LiNGAM and our new test-based causal discovery approach. We find CDDR diagnostic for the test-based approach to be more informative since it uses a richer set of causal discovery outcomes. Under certain assumptions, we prove that the probability estimates of detecting each possible causal discovery outcome are consistent and asymptotically normal. Through simulations, we study CDDR diagnostic's behavior when linearity and non-Gaussianity assumptions are violated. Additionally, we illustrate CDDR diagnostic on four real datasets, including three for which the causal direction is known.
翻译:因果发现方法旨在利用观测数据确定变量间的因果方向。基于线性非高斯无环模型(LiNGAM)等功能性因果发现方法依赖于结构和分布假设来推断因果方向。然而,现有研究缺乏评估因果发现方法在不同样本量下表现,以及在实际场景中不可避免的假设违背影响的方法。为填补这一空白,我们提出因果方向检测率(CDDR)诊断工具,用于评估假设违背与样本量之间的交互作用在多大程度上影响假设因果方向的识别能力。针对包含变量X和Y的N对双变量数据集,CDDR诊断通过绘制各因果发现结果(如X导致Y、Y导致X或无法判定)概率随小于N的样本量变化的曲线图进行比较。我们完整发展了双变量情形下的CDDR诊断方法,并演示其在LiNGAM和新型基于检验的因果发现方法中的应用。研究发现基于检验方法的CDDR诊断更具信息量,因其利用更丰富的因果发现结果集合。在特定假设条件下,我们证明检测各可能因果发现结果的概率估计具有一致性和渐近正态性。通过仿真实验,我们研究了线性假设和非高斯性假设被违背时CDDR诊断的行为特征。此外,我们在四个真实数据集(包括三个已知因果方向的数据集)上展示了CDDR诊断的应用效果。