Many of the traditional recommendation algorithms are designed based on the fundamental idea of mining or learning correlative patterns from data to estimate the user-item correlative preference. However, pure correlative learning may lead to Simpson's paradox in predictions, and thus results in sacrificed recommendation performance. Simpson's paradox is a well-known statistical phenomenon, which causes confusions in statistical conclusions and ignoring the paradox may result in inaccurate decisions. Fortunately, causal and counterfactual modeling can help us to think outside of the observational data for user modeling and personalization so as to tackle such issues. In this paper, we propose Causal Collaborative Filtering (CCF) -- a general framework for modeling causality in collaborative filtering and recommendation. We provide a unified causal view of CF and mathematically show that many of the traditional CF algorithms are actually special cases of CCF under simplified causal graphs. We then propose a conditional intervention approach for $do$-operations so that we can estimate the user-item causal preference based on the observational data. Finally, we further propose a general counterfactual constrained learning framework for estimating the user-item preferences. Experiments are conducted on two types of real-world datasets -- traditional and randomized trial data -- and results show that our framework can improve the recommendation performance and reduce the Simpson's paradox problem of many CF algorithms.
翻译:许多传统推荐算法基于从数据中挖掘或学习相关模式以估计用户-项目相关偏好的基本思想而设计。然而,纯相关学习可能导致预测中的辛普森悖论,从而牺牲推荐性能。辛普森悖论是一种广为人知的统计现象,它会导致统计结论产生混淆,忽视该悖论可能造成不准确的决策。幸运的是,因果推理和反事实建模能够帮助我们跳出观测数据的局限进行用户建模与个性化推荐,从而解决此类问题。本文提出因果协同过滤(Causal Collaborative Filtering, CCF)——一个用于协同过滤与推荐中因果建模的通用框架。我们为协同过滤提供了统一的因果视角,并从数学上证明,在简化因果图下,许多传统协同过滤算法实际上是CCF的特例。随后,我们提出一种条件干预方法实现$do$运算,从而能够基于观测数据估计用户-项目因果偏好。最后,我们进一步提出一种通用的反事实约束学习框架来估计用户-项目偏好。实验在两种类型的真实数据集(传统数据与随机试验数据)上进行,结果表明我们的框架能够提升推荐性能,并缓解许多协同过滤算法中的辛普森悖论问题。