Research is advancing faster than ever with artificial intelligence (AI); and so are the corresponding research papers. The exploding volume of AI-generated papers have put a strain to peer review, leading to the usage of AI-generated review, potentially wide yet sneaky. However, relevant ethical concerns about confidentiality, quality, and fairness are raised and no consensus has been reached in the broad research community. We expect the debate to continue for a while, but in the meantime, we ask an alternative, practical question: \textit{can AI review improve paper drafting?} We study 20 computer architecture papers, with varying levels of submission lineage, to expose how well AI review aligns with human review, quantified by a set of metrics we define. To conduct the case study, we build a web UI-integrated tool, \emph{AI-Paper-Review}, that generates structured AI review of a draft paper, available at https://github.com/unarylab/ai-paper-review. This tool selects several AI reviewers from a diverse pool of AI reviewers and clusters and ranks their comments based on commonality and importance of review comments. It also allows to align AI comments with human comments to facilitate metric-based validation. The case study shows that AI review can cover a significant fraction of human-raised issues, but also raises issues missing in human review. This paper is not intended to encourage using AI for peer review at the current stage, but to study that (1) how AI review can improve paper drafting and (2) the potential and limitation of AI-based peer review. The release of the tool and the case study data is intended to instigate future research on this topic. Misuse for peer review would violate the ethics policies from major academic venues.
翻译:随着人工智能的迅猛发展,相关研究产出的学术论文数量也呈爆炸性增长。大量AI生成论文对同行评审体系造成压力,催生了AI审稿的广泛应用——这种应用可能广泛且隐蔽。然而,关于保密性、质量和公平性的伦理争议随之而来,学术界尚未达成共识。预计这场争论将持续,但与此同时,我们提出一个务实的替代性问题:AI审稿能否提升论文写作质量?本研究选取20篇具有不同投稿资历的计算机体系结构论文,通过定义量化指标体系,系统评估AI审稿与人工审稿的吻合程度。为开展案例研究,我们构建了集成网络界面的工具AI-Paper-Review(代码开源:https://github.com/unarylab/ai-paper-review),可对论文草稿生成结构化AI审稿意见。该工具从多样化AI审稿人池中遴选多位审稿人,基于意见的共现度和重要性对评论进行聚类排序,并支持AI评论与人工评论的对比验证。案例研究表明:AI审稿不仅能覆盖人工审稿发现的大多数问题,还能提出人工审稿遗漏的关键问题。本文无意在当前阶段鼓励使用AI进行同行评审,而是旨在探究:(1)AI审稿如何提升论文写作质量;(2)基于AI的同行评审的潜力与局限性。公开工具与案例研究数据旨在推动该领域的后续研究。需强调的是,将AI误用于同行评审将违反主流学术机构的伦理政策。