Research is advancing faster than ever with artificial intelligence (AI); and so are the corresponding research papers. The exploding volume of AI-generated papers have put a strain to peer review, leading to the usage of AI-generated review, potentially wide yet sneaky. However, relevant ethical concerns about confidentiality, quality, and fairness are raised and no consensus has been reached in the broad research community. We expect the debate to continue for a while, but in the meantime, we ask an alternative, practical question: \textit{can AI review improve paper drafting?} We study 20 computer architecture papers, with varying levels of submission lineage, to expose how well AI review aligns with human review, quantified by a set of metrics we define. To conduct the case study, we build a web UI-integrated tool, \emph{AI-Paper-Review}, that generates structured AI review of a draft paper, available at https://github.com/unarylab/ai-paper-review. This tool selects several AI reviewers from a diverse pool of AI reviewers and clusters and ranks their comments based on commonality and importance of review comments. It also allows to align AI comments with human comments to facilitate metric-based validation. The case study shows that AI review can cover a significant fraction of human-raised issues, but also raises issues missing in human review. This paper is not intended to encourage using AI for peer review at the current stage, but to study that (1) how AI review can improve paper drafting and (2) the potential and limitation of AI-based peer review. The release of the tool and the case study data is intended to instigate future research on this topic. Misuse for peer review would violate the ethics policies from major academic venues.
翻译:人工智能(AI)正以前所未有的速度推动研究进展,随之而来的研究论文数量也在激增。海量AI生成的论文给同行评审带来了压力,导致AI生成的审稿意见被使用——这一做法可能广泛存在却隐蔽。然而,关于保密性、质量与公平性的伦理争议已引发关注,但学术界尚未达成共识。预计这场争论将持续,但在此同时,我们提出一个面向实践的替代性问题:《AI审稿能否改进论文写作?》我们以20篇计算机体系结构论文为研究对象(其投稿经历等级各异),通过自行定义的量化指标,揭示AI审稿与人工审稿的契合程度。为开展案例研究,我们构建了一个集成Web界面的工具《AI-Paper-Review》(https://github.com/unarylab/ai-paper-review),可为论文草稿生成结构化AI审稿意见。该工具从多样化的AI审稿池中选取若干审稿人,依据其评语的共性与重要性进行聚类与排序,并支持将AI评语与人工评语对齐以进行基于指标的验证。案例研究表明,AI审稿能覆盖大部分人工发现的问题,同时也会提出人工审稿中遗漏的问题。本文无意在现阶段鼓励使用AI进行同行评审,而是旨在探究:(1)AI审稿如何改进论文写作;(2)AI基同行评审的潜力与局限。工具及案例研究数据的发布意在激发该领域的后续研究。将本工具滥用于同行评审将违反主要学术机构的伦理政策。