RLPO: Residual Listwise Preference Optimization for Long-Context Review Ranking

Review ranking is pivotal in e-commerce for prioritizing diagnostic and authentic feedback from the deluge of user-generated content. While large language models have improved semantic assessment, existing ranking paradigms face a persistent trade-off in long-context settings. Pointwise scoring is efficient but often fails to account for list-level interactions, leading to miscalibrated top-$k$ rankings. Listwise approaches can leverage global context, yet they are computationally expensive and become unstable as candidate lists grow. To address this, we propose Residual Listwise Preference Optimization (RLPO), which formulates ranking as listwise representation-level residual correction over a strong pointwise LLM scorer. RLPO first produces calibrated pointwise scores and item representations, then applies a lightweight encoder over the representations to predict listwise score residuals, avoiding full token-level listwise processing. We also introduce a large-scale benchmark for long-context review ranking with human verification. Experiments show RLPO improves NDCG@k over strong pointwise and listwise baselines and remains robust as list length increases.

翻译：评论排序在电子商务中至关重要，它能在海量用户生成内容中优先呈现具有诊断性和真实性的反馈。尽管大语言模型提升了语义评估能力，但现有的排序范式在长上下文场景下面临着持续的权衡。逐点评分法效率高，但通常无法考虑列表层面的交互，导致前$k$项排序的校准失准。列表方法虽能利用全局上下文，但计算成本高昂，且随着候选列表增长而变得不稳定。为解决此问题，我们提出残差列表偏好优化（RLPO），该方法将排序问题构建为在强大的逐点大语言模型评分器基础上，进行列表层面的表示级残差校正。RLPO首先生成校准的逐点分数和项目表示，随后通过一个轻量级编码器作用于这些表示以预测列表分数残差，从而避免了完整的词元级列表处理。我们还引入了一个经过人工验证的大规模长上下文评论排序基准。实验表明，RLPO在NDCG@k指标上优于强逐点与列表基线方法，并且在列表长度增加时保持稳健性。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

《序列推荐》最新综述

专知会员服务

22+阅读 · 2024年12月27日

大语言模型在序列推荐中的应用

专知会员服务

19+阅读 · 2024年11月12日

【AAAI2023】统一序列更好:时间间隔感知数据增强的序列推荐

专知会员服务

16+阅读 · 2022年12月31日