Causal Representation Learning for Generalisable Recommendation

Predictive models trained on observational data often fail to generalise to the distributions they encounter when deployed, especially when the training data is a product of the system being optimised. Recommender systems are a canonical example: they are trained on interaction logs confounded by the deployed policy, past user behaviour, and platform filtering. As a result, the training distribution differs substantially from the candidate distribution scored at serving time, a gap that makes offline metrics unreliable predictors of online performance. We address the distribution shift problem with a method motivated by causal representation learning (CRL). We propose an information-theoretic disentanglement criterion and prove that its optimum depends only on the causal components of the input. We then derive a tractable variational lower bound that makes the criterion optimisable from finite observational data alone. The scope of our method is narrower than that of much of the CRL literature, in that we target better generalisation under distribution shift, not full identification of all latent causal factors. This narrower target is what makes the method practical, requiring only the existing confounded logs, applying to any standard supervised model, and adding no inference-time cost. Our headline evaluation is an A/B test with millions of users on Spotify, applied to a production ranker for personalised playlist generation. A capacity-matched CRL variant performed on par offline but delivered substantial online gains in listener engagement. Complementary evidence on the public KuaiRand recommendation dataset and a synthetic benchmark with known causal structure shows the same pattern: offline parity with baseline, gains under distribution shift. Across all three settings, adding our causal disentanglement objective yields meaningfully better out-of-distribution generalisation.

翻译：基于观测数据训练的预测模型在部署后常难以泛化至所遇分布，尤其当训练数据来自被优化系统的产出时。推荐系统是典型范例：其训练数据来自受部署策略、用户历史行为及平台过滤共同混杂的交互日志。这使得训练分布与服务阶段评分所用的候选分布存在显著差异，导致离线指标无法可靠预测在线表现。我们采用因果表征学习（CRL）的思路解决分布偏移问题。具体提出一种信息论解耦准则，并证明其最优解仅依赖于输入的因果成分。进而推导出可解的变分下界，使得该准则仅凭有限观测数据即可优化。相较CRL领域多数研究，本方法目标更聚焦：旨在改善分布偏移下的泛化性能，而非完整识别所有潜在因果因子。正是这种聚焦使得方法具备实用性——仅需利用现有混杂日志，可适配任意标准监督模型，且不增加推理阶段计算开销。主要评估基于Spotify百万级用户的A/B测试，应用场景为个性化歌单生成的排序器。容量匹配的CRL变体在离线指标上与基准持平，但线上显著提升了听众参与度。在公开KuaiRand推荐数据集及已知因果结构的合成基准上的补充验证呈现相同模式：离线性能与基准持平，分布偏移场景下表现更优。在三种实验设定中，加入因果解耦目标均能显著提升分布外泛化能力。