The field of causal discovery develops model selection methods to infer cause-effect relations among a set of random variables. For this purpose, different modelling assumptions have been proposed to render cause-effect relations identifiable. One prominent assumption is that the joint distribution of the observed variables follows a linear non-Gaussian structural equation model. In this paper, we develop novel goodness-of-fit tests that assess the validity of this assumption in the basic setting without latent confounders as well as in extension to linear models that incorporate latent confounders. Our approach involves testing algebraic relations among second and higher moments that hold as a consequence of the linearity of the structural equations. Specifically, we show that the linearity implies rank constraints on matrices and tensors derived from moments. For a practical implementation of our tests, we consider a multiplier bootstrap method that uses incomplete U-statistics to estimate subdeterminants, as well as asymptotic approximations to the null distribution of singular values. The methods are illustrated, in particular, for the T\"ubingen collection of benchmark data sets on cause-effect pairs.
翻译:因果发现领域开发模型选择方法,用于推断一组随机变量之间的因果关系。为此,研究者提出了不同的建模假设,使因果关系具有可识别性。其中一项重要假设是观测变量的联合分布服从线性非高斯结构方程模型。本文开发了新型拟合优度检验,用于评估该假设在无潜在混淆变量的基础设置及引入潜在混淆变量的线性模型扩展中的有效性。我们的方法涉及检验由结构方程线性性所蕴含的二阶及高阶矩之间的代数关系。具体而言,我们证明线性性对由矩导出的矩阵和张量施加了秩约束。为实际实施检验,我们考虑采用乘子自助法,通过不完全U统计量估计子行列式,并利用奇异值零分布的渐近近似。相关方法尤其通过图宾根因果对基准数据集集合进行了展示说明。