A fundamental problem of causal discovery is cause-effect inference, learning the correct causal direction between two random variables. Significant progress has been made through modelling the effect as a function of its cause and a noise term, which allows us to leverage assumptions about the generating function class. The recently introduced heteroscedastic location-scale noise functional models (LSNMs) combine expressive power with identifiability guarantees. LSNM model selection based on maximizing likelihood achieves state-of-the-art accuracy, when the noise distributions are correctly specified. However, through an extensive empirical evaluation, we demonstrate that the accuracy deteriorates sharply when the form of the noise distribution is misspecified by the user. Our analysis shows that the failure occurs mainly when the conditional variance in the anti-causal direction is smaller than that in the causal direction. As an alternative, we find that causal model selection through residual independence testing is much more robust to noise misspecification and misleading conditional variance.
翻译:因果发现的一个基本问题是因果推断,即学习两个随机变量之间的正确因果方向。通过将结果变量建模为原因变量与噪声项的函数,并利用关于生成函数类别的假设,相关研究已取得显著进展。最近提出的异方差位置-尺度噪声函数模型(LSNMs)兼具表达能力和可识别性保证。当噪声分布被正确指定时,基于最大化似然的LSNM模型选择方法可实现最先进的准确率。然而,通过广泛的实证评估,我们证明当用户错误指定噪声分布的形式时,准确率会急剧下降。分析表明,这种失败主要发生在反因果方向的条件方差小于因果方向条件方差的情况下。作为替代方案,我们发现通过残差独立性检验进行因果模型选择对噪声误设和误导性条件方差具有更强的鲁棒性。