Pairwise learning underpins implicit collaborative filtering, yet its effectiveness is often hindered by sparse supervision, noisy interactions, and popularity-driven exposure bias. In this paper, we propose Variational Bayesian Personalized Ranking (VarBPR), a tractable variational framework for implicit-feedback pairwise learning that offers principled exposure controllability and theoretical interpretability. VarBPR reformulates pairwise learning as variational inference over discrete latent indexing variables, explicitly modeling noise and indexing uncertainty, and divides training into two stages: variational inference and variational learning. In the variational inference stage, we develop a variational formulation that integrates preference alignment, denoising, and popularity debiasing under a unified ELBO/regularization objective, deriving closed-form posteriors with clear control semantics: the prior encodes a target exposure pattern, while temperature/regularization strength controls posterior-prior adherence. As a result, exposure controllability becomes an endogenous and interpretable outcome of variational inference. In the variational learning stage, we propose a posterior-compression objective that reduces the ideal ELBO's computational complexity from polynomial to linear, with the approximation justified by an explicit Jensen-gap upper bound. Theoretically, we provide interpretable generalization guarantees by identifying a structural error component and revealing the opportunity cost of prioritizing certain exposure patterns (e.g., long-tail), offering a concrete analytical lens for designing controllable recommender systems. Empirically, we validate VarBPR across popular backbones; it demonstrates consistent gains in ranking accuracy, enables controlled long-tail exposure, and preserves the linear-time complexity of BPR.
翻译:成对学习构成了隐式协同过滤的基础,但其效果常受稀疏监督、噪声交互和流行度驱动的曝光偏差所限。本文提出变分贝叶斯个性化排序(VarBPR),一个用于隐式反馈成对学习的可处理变分框架,该框架提供原理性的曝光可控性与理论可解释性。VarBPR将成对学习重新表述为对离散潜在索引变量的变分推断,显式建模噪声与索引不确定性,并将训练分为两个阶段:变分推断与变分学习。在变分推断阶段,我们开发了一种变分形式,将偏好对齐、去噪和流行度去偏统一在一个ELBO/正则化目标下,推导出具有清晰控制语义的闭式后验:先验编码目标曝光模式,而温度/正则化强度控制后验对先验的遵从度。因此,曝光可控性成为变分推断的内生且可解释的结果。在变分学习阶段,我们提出一种后验压缩目标,将理想ELBO的计算复杂度从多项式降至线性,其近似性由一个显式的Jensen间隙上界所证明。理论上,我们通过识别一个结构误差分量并揭示优先考虑特定曝光模式(例如长尾)的机会成本,提供了可解释的泛化保证,为设计可控推荐系统提供了具体的分析视角。实证上,我们在多种流行骨干网络上验证了VarBPR;它在排序准确性上表现出持续提升,能够实现可控的长尾曝光,并保持了BPR的线性时间复杂度。