A topic of great current interest is Causal Representation Learning (CRL), whose goal is to learn a causal model for hidden features in a data-driven manner. Unfortunately, CRL is severely ill-posed since it is a combination of the two notoriously ill-posed problems of representation learning and causal discovery. Yet, finding practical identifiability conditions that guarantee a unique solution is crucial for its practical applicability. Most approaches so far have been based on assumptions on the latent causal mechanisms, such as temporal causality, or existence of supervision or interventions; these can be too restrictive in actual applications. Here, we show identifiability based on novel, weak constraints, which requires no temporal structure, intervention, nor weak supervision. The approach is based assuming the observational mixing exhibits a suitable grouping of the observational variables. We also propose a novel self-supervised estimation framework consistent with the model, prove its statistical consistency, and experimentally show its superior CRL performances compared to the state-of-the-art baselines. We further demonstrate its robustness against latent confounders and causal cycles.
翻译:当前备受关注的研究主题是因果表征学习(Causal Representation Learning, CRL),其目标是以数据驱动的方式学习隐特征的因果模型。然而,CRL 存在严重的不适定性问题,因为它结合了表征学习和因果发现这两个公认的不适定问题。尽管如此,寻找实用的可识别性条件以确保唯一解,对于其实际应用至关重要。迄今为止,大多数方法都基于对隐式因果机制的假设,例如时间上的因果性、存在监督或干预;这些假设在实际应用中可能过于严格。本文基于新颖的弱约束条件证明了可识别性,这些条件无需时间结构、干预或弱监督。该方法基于观测混合中观测变量的适当分组假设。我们还提出了一种新颖的、与模型一致的自监督估计框架,证明了其统计一致性,并通过实验表明其 CRL 性能优于现有最先进的基线方法。进一步,我们证明了该方法对隐式混杂因子和因果循环具有鲁棒性。