Review ranking is pivotal in e-commerce for prioritizing diagnostic and authentic feedback from the deluge of user-generated content. While large language models have improved semantic assessment, existing ranking paradigms face a persistent trade-off in long-context settings. Pointwise scoring is efficient but often fails to account for list-level interactions, leading to miscalibrated top-$k$ rankings. Listwise approaches can leverage global context, yet they are computationally expensive and become unstable as candidate lists grow. To address this, we propose Residual Listwise Preference Optimization (RLPO), which formulates ranking as listwise representation-level residual correction over a strong pointwise LLM scorer. RLPO first produces calibrated pointwise scores and item representations, then applies a lightweight encoder over the representations to predict listwise score residuals, avoiding full token-level listwise processing. We also introduce a large-scale benchmark for long-context review ranking with human verification. Experiments show RLPO improves NDCG@k over strong pointwise and listwise baselines and remains robust as list length increases.
翻译:评论排序在电子商务中至关重要,它能在海量用户生成内容中优先呈现具有诊断性和真实性的反馈。尽管大语言模型提升了语义评估能力,但现有的排序范式在长上下文场景下面临着持续的权衡。逐点评分法效率高,但通常无法考虑列表层面的交互,导致前$k$项排序的校准失准。列表方法虽能利用全局上下文,但计算成本高昂,且随着候选列表增长而变得不稳定。为解决此问题,我们提出残差列表偏好优化(RLPO),该方法将排序问题构建为在强大的逐点大语言模型评分器基础上,进行列表层面的表示级残差校正。RLPO首先生成校准的逐点分数和项目表示,随后通过一个轻量级编码器作用于这些表示以预测列表分数残差,从而避免了完整的词元级列表处理。我们还引入了一个经过人工验证的大规模长上下文评论排序基准。实验表明,RLPO在NDCG@k指标上优于强逐点与列表基线方法,并且在列表长度增加时保持稳健性。