Peer review is the method employed by the scientific community for evaluating research advancements. In the field of cybersecurity, the practice of double-blind peer review is the de-facto standard. This paper touches on the holy grail of peer reviewing and aims to shed light on the performance of AI in reviewing for academic security conferences. Specifically, we investigate the predictability of reviewing outcomes by comparing the results obtained from human reviewers and machine-learning models. To facilitate our study, we construct a comprehensive dataset by collecting thousands of papers from renowned computer science conferences and the arXiv preprint website. Based on the collected data, we evaluate the prediction capabilities of ChatGPT and a two-stage classification approach based on the Doc2Vec model with various classifiers. Our experimental evaluation of review outcome prediction using the Doc2Vec-based approach performs significantly better than the ChatGPT and achieves an accuracy of over 90%. While analyzing the experimental results, we identify the potential advantages and limitations of the tested ML models. We explore areas within the paper-reviewing process that can benefit from automated support approaches, while also recognizing the irreplaceable role of human intellect in certain aspects that cannot be matched by state-of-the-art AI techniques.
翻译:同行评审是科学界用于评估研究进展的标准方法。在网络安全领域,双盲同行评审是事实上的标准做法。本文触及同行评审的核心难题,旨在揭示人工智能在安全学术会议评审中的表现。具体而言,我们通过比较人类审稿人与机器学习模型的结果,探究评审结果的可预测性。为便于研究,我们构建了一个综合数据集,收集了来自知名计算机科学会议及arXiv预印本网站的数千篇论文。基于收集的数据,我们评估了ChatGPT以及基于Doc2Vec模型结合多种分类器的两阶段分类方法的预测能力。利用基于Doc2Vec的方法进行评审结果预测的实验评估表明,其性能显著优于ChatGPT,且准确率超过90%。在分析实验结果时,我们识别了所测试机器学习模型的潜在优势与局限。我们探索了论文评审过程中可受益于自动化支持方法的领域,同时认识到人类智慧在某些方面具有不可替代的作用,这是当前最先进的人工智能技术无法企及的。