Can AI Review Improve Paper Drafting? An Empirical Study on 20 Computer Architecture Submissions

Research is advancing faster than ever with artificial intelligence (AI); and so are the corresponding research papers. The exploding volume of AI-generated papers have put a strain to peer review, leading to the usage of AI-generated review, potentially wide yet sneaky. However, relevant ethical concerns about confidentiality, quality, and fairness are raised and no consensus has been reached in the broad research community. We expect the debate to continue for a while, but in the meantime, we ask an alternative, practical question: \textit{can AI review improve paper drafting?} We study 20 computer architecture papers, with varying levels of submission lineage, to expose how well AI review aligns with human review, quantified by a set of metrics we define. To conduct the case study, we build a web UI-integrated tool, \emph{AI-Paper-Review}, that generates structured AI review of a draft paper, available at https://github.com/unarylab/ai-paper-review. This tool selects several AI reviewers from a diverse pool of AI reviewers and clusters and ranks their comments based on commonality and importance of review comments. It also allows to align AI comments with human comments to facilitate metric-based validation. The case study shows that AI review can cover a significant fraction of human-raised issues, but also raises issues missing in human review. This paper is not intended to encourage using AI for peer review at the current stage, but to study that (1) how AI review can improve paper drafting and (2) the potential and limitation of AI-based peer review. The release of the tool and the case study data is intended to instigate future research on this topic. Misuse for peer review would violate the ethics policies from major academic venues.

翻译：随着人工智能的迅猛发展，相关研究产出的学术论文数量也呈爆炸性增长。大量AI生成论文对同行评审体系造成压力，催生了AI审稿的广泛应用——这种应用可能广泛且隐蔽。然而，关于保密性、质量和公平性的伦理争议随之而来，学术界尚未达成共识。预计这场争论将持续，但与此同时，我们提出一个务实的替代性问题：AI审稿能否提升论文写作质量？本研究选取20篇具有不同投稿资历的计算机体系结构论文，通过定义量化指标体系，系统评估AI审稿与人工审稿的吻合程度。为开展案例研究，我们构建了集成网络界面的工具AI-Paper-Review（代码开源：https://github.com/unarylab/ai-paper-review），可对论文草稿生成结构化AI审稿意见。该工具从多样化AI审稿人池中遴选多位审稿人，基于意见的共现度和重要性对评论进行聚类排序，并支持AI评论与人工评论的对比验证。案例研究表明：AI审稿不仅能覆盖人工审稿发现的大多数问题，还能提出人工审稿遗漏的关键问题。本文无意在当前阶段鼓励使用AI进行同行评审，而是旨在探究：（1）AI审稿如何提升论文写作质量；（2）基于AI的同行评审的潜力与局限性。公开工具与案例研究数据旨在推动该领域的后续研究。需强调的是，将AI误用于同行评审将违反主流学术机构的伦理政策。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

Nature杂志《AI科学家诞生：从构思到论文发表，全程无需人类插手》

专知会员服务

26+阅读 · 3月28日

《理解决策主体对可竞争人工智能系统的需求和感知》最新262页论文

专知会员服务

28+阅读 · 2025年4月14日

《内容凭证：加强生成式人工智能时代的多媒体完整性》最新25页报告

专知会员服务

20+阅读 · 2025年3月4日

【剑桥大学博士论文】使用Dataflow实现可维护和可解释的AI系统，189页pdf

专知会员服务

34+阅读 · 2024年5月8日