Pairwise learning underpins implicit collaborative filtering, yet its effectiveness is often hindered by sparse supervision, noisy interactions, and popularity-driven exposure bias. In this paper, we propose Variational Bayesian Personalized Ranking (VarBPR), a tractable variational framework for implicit-feedback pairwise learning that offers principled exposure controllability and theoretical interpretability. VarBPR reformulates pairwise learning as variational inference over discrete latent indexing variables, explicitly modeling noise and indexing uncertainty, and divides training into two stages: variational inference and variational learning. In the variational inference stage, we develop a variational formulation that integrates preference alignment, denoising, and popularity debiasing under a unified ELBO/regularization objective, deriving closed-form posteriors with clear control semantics: the prior encodes a target exposure pattern, while temperature/regularization strength controls posterior-prior adherence. As a result, exposure controllability becomes an endogenous and interpretable outcome of variational inference. In the variational learning stage, we propose a posterior-compression objective that reduces the ideal ELBO's computational complexity from polynomial to linear, with the approximation justified by an explicit Jensen-gap upper bound. Theoretically, we provide interpretable generalization guarantees by identifying a structural error component and revealing the opportunity cost of prioritizing certain exposure patterns (e.g., long-tail), offering a concrete analytical lens for designing controllable recommender systems. Empirically, We validate VarBPR across popular backbones; it demonstrates consistent gains in ranking accuracy, enables controlled long-tail exposure, and preserves the linear-time complexity of BPR.
翻译:成对学习支撑着隐式协同过滤,但其有效性常受制于稀疏监督、噪声交互以及流行度驱动的曝光偏差。本文提出变分贝叶斯个性化排序(VarBPR),一种针对隐式反馈成对学习的可处理变分框架,该框架具备原则性的曝光可控性与理论可解释性。VarBPR将成对学习重构为对离散隐索引变量的变分推理,明确建模噪声与索引不确定性,并将训练分为两个阶段:变分推理与变分学习。在变分推理阶段,我们提出一个变分形式,在统一的ELBO/正则化目标下整合偏好对齐、去噪与流行度去偏操作,推导出具有清晰控制语义的闭式后验:先验编码目标曝光模式,而温度/正则化强度控制后验对先验的遵从程度。由此,曝光可控性成为变分推理的内生且可解释的结果。在变分学习阶段,我们提出后验压缩目标,将理想ELBO的计算复杂度从多项式降至线性,并通过显式的Jensen上界证明该近似的合理性。理论上,我们通过识别结构误差分量并揭示优先处理特定曝光模式(如长尾分布)的机会成本,提供了可解释的泛化保证,为设计可控推荐系统提供了具体的分析视角。实验表明,VarBPR在主流骨干网络上均取得一致性排序精度提升,能实现长尾曝光可控性,同时保持BPR的线性时间复杂度。