Reviewer assignment is increasingly critical yet challenging in the LLM era, where rapid topic shifts render many pre-2023 benchmarks outdated and where proxy signals poorly reflect true reviewer familiarity. We address this evaluation bottleneck by introducing LR-bench, a high-fidelity, up-to-date benchmark curated from 2024-2025 AI/NLP manuscripts with five-level self-assessed familiarity ratings collected via a large-scale email survey, yielding 1055 expert-annotated paper-reviewer-score annotations. We further propose RATE, a reviewer-centric ranking framework that distills each reviewer's recent publications into compact keyword-based profiles and fine-tunes an embedding model with weak preference supervision constructed from heuristic retrieval signals, enabling matching each manuscript against a reviewer profile directly. Across LR-bench and the CMU gold-standard dataset, our approach consistently achieves state-of-the-art performance, outperforming strong embedding baselines by a clear margin. We release LR-bench at https://huggingface.co/datasets/Gnociew/LR-bench, and a GitHub repository at https://github.com/Gnociew/RATE-Reviewer-Assign.
翻译:在大型语言模型时代,审稿人分配日益关键且充满挑战:快速的主题变迁使得许多2023年前的基准过时,而代理信号难以真实反映审稿人的熟悉程度。为突破这一评估瓶颈,我们提出了LR-bench——一个基于2024-2025年AI/NLP领域稿件构建的高保真度最新基准,通过大规模邮件调查收集了五级自评熟悉度评分,最终获得1055条专家标注的论文-审稿人-评分数据。我们进一步提出RATE,一个以审稿人为中心的排序框架:该框架将每位审稿人近期发表的论文提炼为紧凑的关键词画像,并利用基于启发式检索信号构建的弱偏好监督对嵌入模型进行微调,从而实现对稿件与审稿人画像的直接匹配。在LR-bench和CMU黄金标准数据集上的实验表明,我们的方法持续取得最先进的性能,显著优于强基线嵌入模型。我们已在https://huggingface.co/datasets/Gnociew/LR-bench发布LR-bench数据集,并在https://github.com/Gnociew/RATE-Reviewer-Assign开源代码仓库。