Causal models provide rich descriptions of complex systems as sets of mechanisms by which each variable is influenced by its direct causes. They support reasoning about manipulating parts of the system and thus hold promise for addressing some of the open challenges of artificial intelligence (AI), such as planning, transferring knowledge in changing environments, or robustness to distribution shifts. However, a key obstacle to more widespread use of causal models in AI is the requirement that the relevant variables be specified a priori, which is typically not the case for the high-dimensional, unstructured data processed by modern AI systems. At the same time, machine learning (ML) has proven quite successful at automatically extracting useful and compact representations of such complex data. Causal representation learning (CRL) aims to combine the core strengths of ML and causality by learning representations in the form of latent variables endowed with causal model semantics. In this thesis, we study and present new results for different CRL settings. A central theme is the question of identifiability: Given infinite data, when are representations satisfying the same learning objective guaranteed to be equivalent? This is an important prerequisite for CRL, as it formally characterises if and when a learning task is, at least in principle, feasible. Since learning causal models, even without a representation learning component, is notoriously difficult, we require additional assumptions on the model class or rich data beyond the classical i.i.d. setting. By partially characterising identifiability for different settings, this thesis investigates what is possible for CRL without direct supervision, and thus contributes to its theoretical foundations. Ideally, the developed insights can help inform data collection practices or inspire the design of new practical estimation methods.
翻译:因果模型通过将复杂系统描述为各变量受其直接原因影响的一系列机制,提供了丰富的系统刻画。它们支持对系统部分进行干预的推理,因此有望解决人工智能(AI)领域的一些开放挑战,如规划、变化环境中的知识迁移或对分布偏移的鲁棒性。然而,因果模型在AI中更广泛应用的一个关键障碍是要求相关变量必须事先指定,而这对于现代AI系统处理的高维非结构化数据通常不成立。与此同时,机器学习(ML)在自动提取此类复杂数据的有用紧凑表示方面已取得显著成功。因果表示学习(CRL)旨在结合ML与因果性的核心优势,通过学习具有因果模型语义的潜变量形式来表示数据。在本论文中,我们研究并提出了不同CRL设置下的新结果。一个核心主题是可辨识性问题:在无限数据条件下,满足相同学习目标的表示何时能保证等价?这是CRL的重要前提,因为它从形式上刻画了学习任务是否以及在何种条件下(至少在理论上)是可行的。由于学习因果模型(即使不包含表示学习组件)本就极为困难,我们需要对模型类别或超越经典独立同分布设置的丰富数据施加额外假设。通过对不同设置下的可辨识性进行部分刻画,本论文探讨了在没有直接监督的情况下CRL可能实现的目标,从而为其理论基础作出贡献。理想情况下,这些研究见解有助于指导数据收集实践,或启发新的实用估计方法的设计。