Supervised learning approaches for causal discovery from observational data often achieve competitive performance despite seemingly avoiding explicit assumptions that traditional methods make for identifiability. In this work, we investigate CSIvA (Ke et al., 2023), a transformer-based model promising to train on synthetic data and transfer to real data. First, we bridge the gap with existing identifiability theory and show that constraints on the training data distribution implicitly define a prior on the test observations. Consistent with classical approaches, good performance is achieved when we have a good prior on the test data, and the underlying model is identifiable. At the same time, we find new trade-offs. Training on datasets generated from different classes of causal models, unambiguously identifiable in isolation, improves the test generalization. Performance is still guaranteed, as the ambiguous cases resulting from the mixture of identifiable causal models are unlikely to occur (which we formally prove). Overall, our study finds that amortized causal discovery still needs to obey identifiability theory, but it also differs from classical methods in how the assumptions are formulated, trading more reliance on assumptions on the noise type for fewer hypotheses on the mechanisms.
翻译:从观测数据中进行因果发现的监督学习方法,尽管看似回避了传统方法为可识别性所做的显式假设,却往往能取得有竞争力的性能。本研究针对CSIvA(Ke等人,2023)——一种基于Transformer的模型展开探讨,该模型承诺可在合成数据上训练并迁移至真实数据。首先,我们弥合了与现有可识别性理论的差距,证明训练数据分布的约束隐式地定义了测试观测的先验。与经典方法一致,当我们在测试数据上具备良好的先验且底层模型可识别时,能获得良好性能。同时,我们发现了新的权衡关系:在由不同类别因果模型生成的数据集上训练(这些模型单独来看均具有明确可识别性),能提升测试泛化能力。性能仍能得到保证,因为由可识别因果模型混合产生的模糊情况不太可能出现(我们对此进行了形式化证明)。总体而言,我们的研究表明,摊销式因果发现仍需遵循可识别性理论,但其在假设构建方式上与经典方法存在差异——通过更多地依赖噪声类型假设,减少对机制假设的依赖。