We present OpenReviewer, an open-source system for generating high-quality peer reviews of machine learning and AI conference papers. At its core is Llama-OpenReviewer-8B, an 8B parameter language model specifically fine-tuned on 79,000 expert reviews from top ML conferences. Given a PDF paper submission and review template as input, OpenReviewer extracts the full text, including technical content like equations and tables, and generates a structured review following conference-specific guidelines. Our evaluation on 400 test papers shows that OpenReviewer produces significantly more critical and realistic reviews compared to general-purpose LLMs like GPT-4 and Claude-3.5. While other LLMs tend toward overly positive assessments, OpenReviewer's recommendations closely match the distribution of human reviewer ratings. The system provides authors with rapid, constructive feedback to improve their manuscripts before submission, though it is not intended to replace human peer review. OpenReviewer is available as an online demo and open-source tool.
翻译:我们提出了OpenReviewer,一个用于生成机器学习和人工智能会议论文高质量同行评审的开源系统。其核心是Llama-OpenReviewer-8B,这是一个拥有80亿参数的专用语言模型,基于从顶级机器学习会议收集的79,000份专家评审进行了专门微调。给定PDF格式的论文投稿和评审模板作为输入,OpenReviewer能够提取全文内容(包括公式和表格等技术性内容),并遵循特定会议的指南生成结构化评审。我们在400篇测试论文上的评估表明,与GPT-4和Claude-3.5等通用大语言模型相比,OpenReviewer生成的评审明显更具批判性和真实性。其他大语言模型往往倾向于给出过于积极的评价,而OpenReviewer的评审建议则与人类评审员评分的分布高度吻合。该系统能为作者在投稿前提供快速、建设性的反馈以改进稿件,但其目的并非取代人类同行评审。OpenReviewer已作为在线演示系统和开源工具提供使用。