A System-Level Analysis of Conference Peer Review

The conference peer review process involves three constituencies with different objectives: authors want their papers accepted at prestigious venues (and quickly), conferences want to present a program with many high-quality and few low-quality papers, and reviewers want to avoid being overburdened by reviews. These objectives are far from aligned, primarily because the evaluation of a submission is inherently noisy. Over the years, conferences have experimented with numerous policies to navigate the tradeoffs. These experiments include setting various bars for acceptance, varying the number of reviews per submission, requiring prior reviews to be included with resubmissions, and others. In this work, we investigate, both analytically and empirically, how well various policies work, and more importantly, why they do or do not work. We model the conference-author interactions as a Stackelberg game in which a prestigious conference commits to an acceptance policy; the authors best-respond by (re)submitting or not (re)submitting to the conference in each round of review, the alternative being a "sure accept" (such as a lightly refereed venue). Our main results include the following observations: 1) the conference should typically set a higher acceptance threshold than the actual desired quality; we call this the "resubmission gap". 2) the reviewing load is heavily driven by resubmissions of borderline papers - therefore, a judicious choice of acceptance threshold may lead to fewer reviews while incurring an acceptable loss in conference quality. 3) conference prestige, reviewer inaccuracy, and author patience increase the resubmission gap, and thus increase the review load for a fixed level of conference quality. For robustness, we further consider different models of paper quality and compare our theoretical results to simulations based on plausible parameters estimated from real data.

翻译：会议同行评审过程涉及三个具有不同目标的利益相关方：作者希望其论文能被知名会议快速接收，会议希望展示包含大量高质量论文和少量低质量论文的议程，而审稿人则希望避免被过多的审稿任务压垮。这些目标远非一致，主要是因为对投稿的评估本身存在固有噪声。多年来，各会议尝试了多种策略来平衡这些权衡，包括设置不同的接收门槛、调整每篇投稿的审稿人数、要求重新提交时附上之前的审稿意见等。在本工作中，我们从分析和实证两个角度探讨了不同策略的有效性，更重要的是，揭示了其生效或失效的原因。我们将会议与作者之间的交互建模为一类斯坦克尔伯格博弈：知名会议承诺采用某种接收策略，作者通过决定在每轮审稿中是否（重新）向该会议投稿来做出最优响应，其替代选择是"确定接收"（如审核宽松的会议）。我们的主要结果包括以下发现：1）会议通常应设置比实际期望质量更高的接收阈值，我们称之为"重新提交差距"；2）审稿负担主要由处于临界水平的论文的重新提交驱动——因此，合理选择接收阈值可能在牺牲可接受的会议质量的同时减少审稿量；3）会议声望、审稿人失误率以及作者耐心会扩大重新提交差距，从而在固定会议质量水平下增加审稿负担。为增强鲁棒性，我们进一步考虑了不同的论文质量模型，并将理论结果与基于真实数据估计的合理参数下的模拟进行了对比。