RLPO: Residual Listwise Preference Optimization for Long-Context Review Ranking

Review ranking is pivotal in e-commerce for prioritizing diagnostic and authentic feedback from the deluge of user-generated content. While large language models have improved semantic assessment, existing ranking paradigms face a persistent trade-off in long-context settings. Pointwise scoring is efficient but often fails to account for list-level interactions, leading to miscalibrated top-$k$ rankings. Listwise approaches can leverage global context, yet they are computationally expensive and become unstable as candidate lists grow. To address this, we propose Residual Listwise Preference Optimization (RLPO), which formulates ranking as listwise representation-level residual correction over a strong pointwise LLM scorer. RLPO first produces calibrated pointwise scores and item representations, then applies a lightweight encoder over the representations to predict listwise score residuals, avoiding full token-level listwise processing. We also introduce a large-scale benchmark for long-context review ranking with human verification. Experiments show RLPO improves NDCG@k over strong pointwise and listwise baselines and remains robust as list length increases.

翻译：摘要：评论排序在电子商务中至关重要，用于从用户生成内容的洪流中优先筛选出诊断性和真实反馈。尽管大语言模型改进了语义评估，现有排序范式在长上下文场景中仍面临持续的权衡问题。逐点评分效率较高，但常无法捕捉列表级别的交互，导致top-$k$排序校准偏差。列表方法能利用全局上下文，但计算成本高昂，且随候选列表增长变得不稳定。针对此问题，我们提出残差列表偏好优化（RLPO），该方法将排序形式化为基于强逐点LLM评分器的列表级表示残差校正。RLPO首先生成校准后的逐点评分与项目表示，随后通过轻量级编码器处理表示以预测列表级评分残差，从而避免完整的令牌级列表处理。我们还引入了一个经人工验证的大规模长上下文评论排序基准。实验表明，RLPO在NDCG@k指标上优于强逐点与列表基线，且随列表长度增加保持鲁棒性。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【博士论文】电商搜索中的排序学习

专知会员服务

13+阅读 · 2025年11月15日

《序列推荐》最新综述

专知会员服务

22+阅读 · 2024年12月27日

大语言模型在序列推荐中的应用

专知会员服务

19+阅读 · 2024年11月12日

【AAAI2023】统一序列更好:时间间隔感知数据增强的序列推荐

专知会员服务

16+阅读 · 2022年12月31日