Empirical research on code review processes is increasingly central to understanding software quality and collaboration. However, collecting and analyzing review data remains a time-consuming and technically intensive task. Most researchers follow similar workflows - writing ad hoc scripts to extract, filter, and analyze review data from platforms like GitHub and GitLab. This paper introduces RevMine, a conceptual tool that streamlines the entire code review mining pipeline using large language models (LLMs). RevMine guides users through authentication, endpoint discovery, and natural language-driven data collection, significantly reducing the need for manual scripting. After retrieving review data, it supports both quantitative and qualitative analysis based on user-defined filters or LLM-inferred patterns. This poster outlines the tool's architecture, use cases, and research potential. By lowering the barrier to entry, RevMine aims to democratize code review mining and enable a broader range of empirical software engineering studies.
翻译:代码审查过程的实证研究对于理解软件质量与协作日益重要。然而,收集与分析审查数据仍然是一项耗时且技术要求高的任务。大多数研究者遵循相似的工作流程——编写临时脚本从GitHub和GitLab等平台提取、过滤和分析审查数据。本文介绍了RevMine,这是一种利用大语言模型(LLMs)简化整个代码审查挖掘流程的概念性工具。RevMine引导用户完成身份验证、端点发现和自然语言驱动的数据收集,显著减少了手动编写脚本的需求。在获取审查数据后,该工具支持基于用户定义过滤器或LLM推断模式的定量与定性分析。本海报概述了该工具的架构、用例及研究潜力。通过降低使用门槛,RevMine旨在使代码审查挖掘大众化,并支持更广泛的实证软件工程研究。