Supervised learning for causal discovery from observational data often achieves competitive performance despite seemingly avoiding the explicit assumptions that traditional methods require for identifiability. In this work, we analyze CSIvA (Ke et al., 2023) on bivariate causal models, a transformer architecture for amortized inference promising to train on synthetic data and transfer to real ones. First, we bridge the gap with identifiability theory, showing that the training distribution implicitly defines a prior on the causal model of the test observations: consistent with classical approaches, good performance is achieved when we have a good prior on the test data, and the underlying model is identifiable. Second, we find that CSIvA can not generalize to classes of causal models unseen during training: to overcome this limitation, we theoretically and empirically analyze \textit{when} training CSIvA on datasets generated by multiple identifiable causal models with different structural assumptions improves its generalization at test time. Overall, we find that amortized causal discovery with transformers still adheres to identifiability theory, violating the previous hypothesis from Lopez-Paz et al. (2015) that supervised learning methods could overcome its restrictions.
翻译:利用观测数据进行因果发现的监督学习方法尽管看似避免了传统方法用于可识别性所需的显式假设,却常能取得具有竞争力的性能。本文以CSIvA(Ke等,2023)为分析对象,聚焦双变量因果模型——该Transformer架构旨在通过合成数据训练并迁移至真实数据,实现摊销推理。首先,我们弥合了其与可识别性理论之间的鸿沟:研究表明,训练分布隐式定义了测试观测数据因果模型上的先验——与经典方法一致,当测试数据具备良好先验且底层模型可识别时,方能取得优异性能。其次,我们发现CSIvA无法泛化至训练中未见的因果模型类别:为克服这一局限,我们从理论与实证层面分析了"何时"在由多个具备不同结构假设的可识别因果模型生成的数据集上训练CSIvA,能提升其测试泛化能力。综上,我们得出"基于Transformer的摊销因果发现仍遵循可识别性理论"的结论,推翻了Lopez-Paz等(2015)关于"监督学习方法可突破可识别性约束"的既有假设。