Investigating the marginal causal effect of an intervention on an outcome from complex data remains challenging due to the inflexibility of employed models and the lack of complexity in causal benchmark datasets, which often fail to reproduce intricate real-world data patterns. In this paper we introduce Frugal Flows, a novel likelihood-based machine learning model that uses normalising flows to flexibly learn the data-generating process, while also directly inferring the marginal causal quantities from observational data. We propose that these models are exceptionally well suited for generating synthetic data to validate causal methods. They can create synthetic datasets that closely resemble the empirical dataset, while automatically and exactly satisfying a user-defined average treatment effect. To our knowledge, Frugal Flows are the first generative model to both learn flexible data representations and also exactly parameterise quantities such as the average treatment effect and the degree of unobserved confounding. We demonstrate the above with experiments on both simulated and real-world datasets.
翻译:探究干预措施对复杂数据结果产生的边际因果效应,由于现有模型灵活性不足以及因果基准数据集复杂度欠缺,仍然面临挑战。这些基准数据集往往难以复现现实世界中错综复杂的数据模式。本文提出Frugal Flows——一种基于似然的新型机器学习模型,该模型利用归一化流灵活学习数据生成过程,同时直接从观测数据中推断边际因果量。我们认为这类模型特别适用于生成验证因果方法的合成数据。它们能够创建与实证数据集高度相似的合成数据集,同时自动且精确地满足用户定义的平均处理效应。据我们所知,Frugal Flows是首个既能学习灵活数据表示,又能精确参数化平均处理效应和未观测混杂程度等量的生成模型。我们通过模拟数据集和真实数据集的实验验证了上述特性。