The goal of Causal Discovery is to find automated search methods for learning causal structures from observational data. In some cases all variables of the interested causal mechanism are measured, and the task is to predict the effects one measured variable has on another. In contrast, sometimes the variables of primary interest are not directly observable but instead inferred from their manifestations in the data. These are referred to as latent variables. One commonly known example is the psychological construct of intelligence, which cannot directly measured so researchers try to assess through various indicators such as IQ tests. In this case, casual discovery algorithms can uncover underlying patterns and structures to reveal the causal connections between the latent variables and between the latent and observed variables. This thesis focuses on two questions in causal discovery: providing an alternative definition of k-Triangle Faithfulness that (i) is weaker than strong faithfulness when applied to the Gaussian family of distributions, (ii) can be applied to non-Gaussian families of distributions, and (iii) under the assumption that the modified version of Strong Faithfulness holds, can be used to show the uniform consistency of a modified causal discovery algorithm; relaxing the sufficiency assumption to learn causal structures with latent variables. Given the importance of inferring cause-and-effect relationships for understanding and forecasting complex systems, the work in this thesis of relaxing various simplification assumptions is expected to extend the causal discovery method to be applicable in a wider range with diversified causal mechanism and statistical phenomena.
翻译:因果发现的目标是寻找从观测数据中学习因果结构的自动化搜索方法。在某些情况下,所关注因果机制中的所有变量均可直接测量,任务在于预测某一测量变量对另一变量的影响。相反,有时主要关注的变量无法直接观测,而需通过其在数据中的表现进行推断,这类变量称为潜变量。一个常见的例子是心理学中的智力构念——它无法直接测量,研究者需通过智商测试等多项指标进行评估。在此类场景中,因果发现算法能够揭示隐含模式与结构,阐明潜变量之间以及潜变量与观测变量之间的因果关联。本论文聚焦因果发现中的两个问题:一是为k-三角忠诚度提供一种替代性定义,该定义(i)在应用于高斯分布族时弱于强忠诚度条件,(ii)可适用于非高斯分布族,且(iii)在修正版强忠诚度成立的前提下,可用于证明某一修正因果发现算法的一致收敛性;二是放宽充分性假设,以学习包含潜变量的因果结构。鉴于推断因果关系对理解与预测复杂系统的重要性,本文通过放宽多种简化假设的工作,有望拓展因果发现方法的适用范围,使其能适配更多样化的因果机制与统计现象。