Many real-world systems can be represented as sets of interacting components. Examples of such systems include computational systems such as query processors, natural systems such as cells, and social systems such as families. Many approaches have been proposed in traditional (associational) machine learning to model such structured systems, including statistical relational models and graph neural networks. Despite this prior work, existing approaches to estimating causal effects typically treat such systems as single units, represent them with a fixed set of variables and assume a homogeneous data-generating process. We study a compositional approach for estimating individual treatment effects (ITE) in structured systems, where each unit is represented by the composition of multiple heterogeneous components. This approach uses a modular architecture to model potential outcomes at each component and aggregates component-level potential outcomes to obtain the unit-level potential outcomes. We discover novel benefits of the compositional approach in causal inference - systematic generalization to estimate counterfactual outcomes of unseen combinations of components and improved overlap guarantees between treatment and control groups compared to the classical methods for causal effect estimation. We also introduce a set of novel environments for empirically evaluating the compositional approach and demonstrate the effectiveness of our approach using both simulated and real-world data.
翻译:许多现实世界系统可被表示为相互作用组件的集合。这类系统的实例包括计算系统(如查询处理器)、自然系统(如细胞)以及社会系统(如家庭)。在传统(关联性)机器学习中,已有多种方法被提出用于对此类结构化系统进行建模,包括统计关系模型和图神经网络。尽管已有这些前期工作,现有因果效应估计方法通常将此类系统视为单一单元,用固定变量集进行表征,并假设同质的数据生成过程。本研究探讨了一种在结构化系统中估计个体处理效应(ITE)的组合方法,其中每个单元由多个异质组件的组合构成。该方法采用模块化架构对各组件的潜在结果进行建模,并通过聚合组件层级的潜在结果来获得单元层级的潜在结果。我们发现了组合方法在因果推断中的新优势——能够系统性地泛化至未见组件组合的反事实结果估计,且与经典因果效应估计方法相比,在处理组与对照组之间获得了更好的重叠性保证。我们还引入了一套新颖的实验环境用于实证评估组合方法,并通过模拟数据与真实世界数据验证了该方法的有效性。