In this paper we show how to exploit interventional data to acquire the joint conditional distribution of all the variables using the Maximum Entropy principle. To this end, we extend the Causal Maximum Entropy method to make use of interventional data in addition to observational data. Using Lagrange duality, we prove that the solution to the Causal Maximum Entropy problem with interventional constraints lies in the exponential family, as in the Maximum Entropy solution. Our method allows us to perform two tasks of interest when marginal interventional distributions are provided for any subset of the variables. First, we show how to perform causal feature selection from a mixture of observational and single-variable interventional data, and, second, how to infer joint interventional distributions. For the former task, we show on synthetically generated data, that our proposed method outperforms the state-of-the-art method on merging datasets, and yields comparable results to the KCI-test which requires access to joint observations of all variables.
翻译:本文展示了如何利用干预数据,通过最大熵原理获取所有变量的联合条件分布。为此,我们将因果最大熵方法扩展至同时利用观测数据与干预数据。借助拉格朗日对偶性,我们证明了带干预约束的因果最大熵问题的解——如同经典最大熵解——属于指数族分布。当针对任意变量子集提供边际干预分布时,我们的方法支持完成两项重要任务:首先,我们演示了如何从观测数据与单变量干预数据的混合中执行因果特征选择;其次,展示了如何推断联合干预分布。针对前者任务,我们在合成数据上证明,所提方法在数据集融合方面优于当前最优方法,且其性能与需要所有变量联合观测的KCI检验方法相当。