The era of big data has witnessed an increasing availability of observational data from mobile and social networking, online advertising, web mining, healthcare, education, public policy, marketing campaigns, and so on, which facilitates the development of causal effect estimation. Although significant advances have been made to overcome the challenges in the academic area, such as missing counterfactual outcomes and selection bias, they only focus on source-specific and stationary observational data, which is unrealistic in most industrial applications. In this paper, we investigate a new industrial problem of causal effect estimation from incrementally available observational data and present three new evaluation criteria accordingly, including extensibility, adaptability, and accessibility. We propose a Continual Causal Effect Representation Learning method for estimating causal effects with observational data, which are incrementally available from non-stationary data distributions. Instead of having access to all seen observational data, our method only stores a limited subset of feature representations learned from previous data. Combining selective and balanced representation learning, feature representation distillation, and feature transformation, our method achieves the continual causal effect estimation for new data without compromising the estimation capability for original data. Extensive experiments demonstrate the significance of continual causal effect estimation and the effectiveness of our method.
翻译:大数据时代,来自移动社交网络、在线广告、网络挖掘、医疗健康、教育、公共政策、营销活动等领域的观测数据日益丰富,推动了因果效应估计的发展。尽管学术界在解决反事实结果缺失和选择偏差等挑战方面取得了显著进展,但这些方法仅针对特定来源的静态观测数据,难以适应大多数工业应用场景。本文研究了一个工业界的新问题:基于增量观测数据进行因果效应估计,并据此提出了三个新的评估准则:可扩展性、适应性和可访问性。我们提出了一种持续因果效应表示学习方法,用于从非平稳数据分布中逐步获取的观测数据中估计因果效应。该方法无需访问所有历史观测数据,仅需存储从先前数据中学习到的有限特征表示子集。通过结合选择性平衡表示学习、特征表示蒸馏和特征变换,我们的方法在保持原始数据估计能力的前提下,实现了对新数据的持续因果效应估计。大量实验证明了持续因果效应估计的重要性及所提方法的有效性。