Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI

With the rapid advancement of Large Language Models (LLMs), the academic community has faced unprecedented disruptions, particularly in the realm of academic communication. The primary function of peer review is improving the quality of academic manuscripts, such as clarity, originality and other evaluation aspects. Although prior studies suggest that LLMs are beginning to influence peer review, it remains unclear whether they are altering its core evaluative functions. Moreover, the extent to which LLMs affect the linguistic form, evaluative focus, and recommendation-related signals of peer-review reports has yet to be systematically examined. In this study, we examine the changes in peer review reports for academic articles following the emergence of LLMs, emphasizing variations at fine-grained level. Specifically, we investigate linguistic features such as the length and complexity of words and sentences in review comments, while also automatically annotating the evaluation aspects of individual review sentences. We also use a maximum likelihood estimation method, previously established, to identify review reports that potentially have modified or generated by LLMs. Finally, we assess the impact of evaluation aspects mentioned in LLM-assisted review reports on the informativeness of recommendation for paper decision-making. The results indicate that following the emergence of LLMs, peer review texts have become longer and more fluent, with increased emphasis on summaries and surface-level clarity, as well as more standardized linguistic patterns, particularly reviewers with lower confidence score. At the same time, attention to deeper evaluative dimensions, such as originality, replicability, and nuanced critical reasoning, has declined.

翻译：随着大语言模型（LLMs）的快速发展，学术界面临前所未有的冲击，尤其在学术交流领域。同行评审的核心功能是提升学术稿件质量，包括清晰度、原创性等评价维度。尽管已有研究表明LLMs开始影响同行评审，但其是否改变了评审的核心评价功能尚不明确。此外，LLMs对评审报告的语言形式、评价焦点及推荐相关信号的影响程度尚未得到系统探究。本研究从细粒度视角出发，考察LLMs出现后学术论文同行评审报告的变化。具体而言，我们分析了评审意见中词汇和句子的长度与复杂度等语言特征，同时对单个评审句子的评价维度进行了自动标注。我们采用既有的最大似然估计方法，识别可能由LLMs修改或生成的评审报告。最终，我们评估了LLM辅助评审报告中提及的评价维度对论文决策推荐信息量的影响。结果表明：LLMs出现后，同行评审文本篇幅更长、语言更流畅，摘要和表层清晰度受关注度显著提升，语言模式更趋标准化（尤其是低置信度评审者）。与此同时，对原创性、可复现性及细致批判性推理等深层评价维度的关注度有所下降。