This work studies the problem of learning unbiased algorithms from biased feedback for recommendation. We address this problem from a novel distribution shift perspective. Recent works in unbiased recommendation have advanced the state-of-the-art with various techniques such as re-weighting, multi-task learning, and meta-learning. Despite their empirical successes, most of them lack theoretical guarantees, forming non-negligible gaps between theories and recent algorithms. In this paper, we propose a theoretical understanding of why existing unbiased learning objectives work for unbiased recommendation. We establish a close connection between unbiased recommendation and distribution shift, which shows that existing unbiased learning objectives implicitly align biased training and unbiased test distributions. Built upon this connection, we develop two generalization bounds for existing unbiased learning methods and analyze their learning behavior. Besides, as a result of the distribution shift, we further propose a principled framework, Adversarial Self-Training (AST), for unbiased recommendation. Extensive experiments on real-world and semi-synthetic datasets demonstrate the effectiveness of AST.
翻译:本研究探讨了从有偏反馈中学习无偏推荐算法的问题。我们从新颖的分布偏移视角来应对这一挑战。近年来,无偏推荐领域的研究通过重加权、多任务学习和元学习等多种技术推动了最前沿的发展。尽管这些方法在实验上取得了成功,但大多数缺乏理论保证,导致理论与现有算法之间存在不可忽视的差距。本文提出了关于现有无偏学习目标为何能实现无偏推荐的理论理解。我们建立了无偏推荐与分布偏移之间的紧密联系,表明现有无偏学习目标隐式地对齐了有偏训练分布与无偏测试分布。基于这一联系,我们为现有无偏学习方法推导了两个泛化界,并分析了其学习行为。此外,由于分布偏移的存在,我们进一步提出了一个原则性框架——对抗自训练(AST),用于无偏推荐。在真实数据集和半合成数据集上的大量实验证明了AST的有效性。