The adoption of large language models (LLMs) is transforming the peer review process, from assisting reviewers in writing detailed evaluations to generating entire reviews automatically. While these capabilities offer new opportunities, they also raise concerns about fairness and reliability. In this paper, we investigate bias in LLM-generated peer reviews through controlled interventions on author metadata, including affiliation, gender, seniority, and publication history. Our analysis consistently shows a strong affiliation bias favoring authors from highly ranked institutions. We also identify directional preferences associated with seniority and prior publication record, which can influence acceptance decisions for borderline papers. Gender effects are smaller but present in several models. Notably, implicit biases become more pronounced when examining token-level soft ratings, suggesting that alignment may mask but not fully eliminate underlying preferences
翻译:大型语言模型(LLM)的采用正在改变同行评议过程——从协助审稿人撰写详细评价到完全自动生成评审意见。这些能力在带来新机遇的同时,也引发了对公平性和可靠性的担忧。本文通过控制作者元数据(包括所属机构、性别、资历和发表记录)的干预实验,系统探究了LLM生成同行评议中的偏见。我们的分析一致显示,存在显著的机构从属偏见,偏向来自顶尖机构的作者。我们还发现与资历和过往发表记录相关的方向性偏好,这会影响边缘论文的接收决策。性别差异效应较小,但在多个模型中均有体现。值得注意的是,当检查词级软评分时,隐式偏见变得更加明显,这表明对齐策略可能掩盖但并未完全消除底层偏好。