In this work, we explore the Large Language Model (LLM) agent reviewer dynamics in an Elo-ranked review system using real-world conference paper submissions. Multiple LLM agent reviewers with different personas are engage in multi round review interactions moderated by an Area Chair. We compare a baseline setting with conditions that incorporate Elo ratings and reviewer memory. Our simulation results showcase several interesting findings, including how incorporating Elo improves Area Chair decision accuracy, as well as reviewers' adaptive review strategy that exploits our Elo system without improving review effort. Our code is available at https://github.com/hsiangwei0903/EloReview.
翻译:本研究利用真实会议论文投稿数据,探索Elo排名评审系统中大语言模型智能体评审员的动态行为。多个具有不同角色设定的大语言模型智能体评审员在领域主席的协调下进行多轮评审交互。我们对比了基线设置与引入Elo评分及评审员记忆机制的实验条件。仿真结果揭示了若干重要发现:Elo机制的引入提升了领域主席决策的准确性,同时评审员会采取自适应策略利用Elo系统而无需提升评审投入度。代码已开源:https://github.com/hsiangwei0903/EloReview。