Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

Recommendation performance usually exhibits a long-tail distribution over users -- a small portion of head users enjoy much more accurate recommendation services than the others. We reveal two sources of this performance heterogeneity problem: the uneven distribution of historical interactions (a natural source); and the biased training of recommender models (a model source). As addressing this problem cannot sacrifice the overall performance, a wise choice is to eliminate the model bias while maintaining the natural heterogeneity. The key to debiased training lies in eliminating the effect of confounders that influence both the user's historical behaviors and the next behavior. The emerging causal recommendation methods achieve this by modeling the causal effect between user behaviors, however potentially neglect unobserved confounders (\eg, friend suggestions) that are hard to measure in practice. To address unobserved confounders, we resort to the front-door adjustment (FDA) in causal theory and propose a causal multi-teacher distillation framework (CausalD). FDA requires proper mediators in order to estimate the causal effects of historical behaviors on the next behavior. To achieve this, we equip CausalD with multiple heterogeneous recommendation models to model the mediator distribution. Then, the causal effect estimated by FDA is the expectation of recommendation prediction over the mediator distribution and the prior distribution of historical behaviors, which is technically achieved by multi-teacher ensemble. To pursue efficient inference, CausalD further distills multiple teachers into one student model to directly infer the causal effect for making recommendations.

翻译：推荐性能通常在用户间呈现长尾分布——少数头部用户享有比其他用户准确得多的推荐服务。我们揭示了这一性能异质问题的两个来源：历史交互的不均匀分布（自然来源）；以及推荐模型的偏置训练（模型来源）。由于解决此问题不能以牺牲整体性能为代价，明智的选择是在保持自然异质性的同时消除模型偏置。去偏训练的关键在于消除同时影响用户历史行为和下一行为的混杂因子效应。新兴的因果推荐方法通过建模用户行为间的因果效应来实现这一点，但可能忽略了实践中难以测量的未观测混杂因子（例如好友推荐）。为处理未观测混杂因子，我们借助因果理论中的前门调整方法，并提出一个因果多教师蒸馏框架。前门调整需要合适的介体以估计历史行为对下一行为的因果效应。为此，我们为CausalD配备多个异构推荐模型来建模介体分布。随后，通过前门调整估计的因果效应即为推荐预测在介体分布和历史行为先验分布上的期望，该过程通过多教师集成技术实现。为追求高效推理，CausalD进一步将多个教师模型蒸馏至单一学生模型，以直接推断用于生成推荐的因果效应。