The rapid growth of AI conferences is straining an already fragile peer-review system, leading to heavy reviewer workloads, expertise mismatches, inconsistent evaluation standards, superficial or templated reviews, and limited accountability under compressed timelines. In response, conference organizers have introduced new policies and interventions to preserve review standards. Yet these ad-hoc changes often create further concerns and confusion about the review process, leaving how papers are ultimately accepted - and how practices evolve across years - largely opaque. We present Paper Copilot, a system that creates durable digital archives of peer reviews across a wide range of computer-science venues, an open dataset that enables researchers to study peer review at scale, and a large-scale empirical analysis of ICLR reviews spanning multiple years. By releasing both the infrastructure and the dataset, Paper Copilot supports reproducible research on the evolution of peer review. We hope these resources help the community track changes, diagnose failure modes, and inform evidence-based improvements toward a more robust, transparent, and reliable peer-review system.
翻译:人工智能会议的快速增长正使本已脆弱的同行评审系统承受更大压力,导致评审者工作负荷过重、专业领域错配、评价标准不一致、评审意见流于表面或模板化,以及在压缩时间线下问责机制有限。为此,会议组织者已推行新政策与干预措施以维持评审标准。然而这些临时性调整往往引发对评审流程的进一步担忧与困惑,使得论文最终录用机制——以及跨年度实践模式的演变过程——在很大程度上仍不透明。本文提出Paper Copilot系统,该系统为广泛计算机科学会议构建了持久化的同行评审数字档案库,创建了支持大规模评审研究的开放数据集,并对跨越多年的ICLR评审数据进行了大规模实证分析。通过同时开源基础设施与数据集,Paper Copilot为同行评审演变的可重复研究提供支持。我们希望这些资源能帮助学界追踪变革轨迹、诊断失效模式,并为构建更稳健、透明、可靠的同行评审系统提供实证依据。