A fundamental problem of causal discovery is cause-effect inference, learning the correct causal direction between two random variables. Significant progress has been made through modelling the effect as a function of its cause and a noise term, which allows us to leverage assumptions about the generating function class. The recently introduced heteroscedastic location-scale noise functional models (LSNMs) combine expressive power with identifiability guarantees. LSNM model selection based on maximizing likelihood achieves state-of-the-art accuracy, when the noise distributions are correctly specified. However, through an extensive empirical evaluation, we demonstrate that the accuracy deteriorates sharply when the form of the noise distribution is misspecified by the user. Our analysis shows that the failure occurs mainly when the conditional variance in the anti-causal direction is smaller than that in the causal direction. As an alternative, we find that causal model selection through residual independence testing is much more robust to noise misspecification and misleading conditional variance.
翻译:因果发现的一个基本问题是因果推断,即学习两个随机变量之间的正确因果方向。通过将结果建模为其原因和噪声项的函数,取得了重大进展,这使我们能够利用关于生成函数类别的假设。最近引入的异方差位置-尺度噪声函数模型(LSNMs)兼具表达能力和可识别性保证。当噪声分布被正确指定时,基于最大化似然的LSNM模型选择达到了最先进的准确性。然而,通过广泛的实证评估,我们表明当用户错误指定噪声分布的形式时,准确性会急剧下降。我们的分析表明,这种失败主要发生在反因果方向的条件方差小于因果方向的条件方差时。作为替代方案,我们发现通过残差独立性检验进行因果模型选择对噪声误指定和误导性条件方差具有更强的鲁棒性。