Review ranking is pivotal in e-commerce for prioritizing diagnostic and authentic feedback from the deluge of user-generated content. While large language models have improved semantic assessment, existing ranking paradigms face a persistent trade-off in long-context settings. Pointwise scoring is efficient but often fails to account for list-level interactions, leading to miscalibrated top-$k$ rankings. Listwise approaches can leverage global context, yet they are computationally expensive and become unstable as candidate lists grow. To address this, we propose Residual Listwise Preference Optimization (RLPO), which formulates ranking as listwise representation-level residual correction over a strong pointwise LLM scorer. RLPO first produces calibrated pointwise scores and item representations, then applies a lightweight encoder over the representations to predict listwise score residuals, avoiding full token-level listwise processing. We also introduce a large-scale benchmark for long-context review ranking with human verification. Experiments show RLPO improves NDCG@k over strong pointwise and listwise baselines and remains robust as list length increases.
翻译:摘要:评论排序在电子商务中至关重要,用于从用户生成内容的洪流中优先筛选出诊断性和真实反馈。尽管大语言模型改进了语义评估,现有排序范式在长上下文场景中仍面临持续的权衡问题。逐点评分效率较高,但常无法捕捉列表级别的交互,导致top-$k$排序校准偏差。列表方法能利用全局上下文,但计算成本高昂,且随候选列表增长变得不稳定。针对此问题,我们提出残差列表偏好优化(RLPO),该方法将排序形式化为基于强逐点LLM评分器的列表级表示残差校正。RLPO首先生成校准后的逐点评分与项目表示,随后通过轻量级编码器处理表示以预测列表级评分残差,从而避免完整的令牌级列表处理。我们还引入了一个经人工验证的大规模长上下文评论排序基准。实验表明,RLPO在NDCG@k指标上优于强逐点与列表基线,且随列表长度增加保持鲁棒性。