Context: The emergence of Large Language Models (LLMs) has significantly transformed Software Engineering (SE) by providing innovative methods for analyzing software repositories. Objectives: Our objective is to establish a practical framework for future SE researchers needing to enhance the data collection and dataset while conducting software repository mining studies using LLMs. Method: This experience report shares insights from two previous repository mining studies, focusing on the methodologies used for creating, refining, and validating prompts that enhance the output of LLMs, particularly in the context of data collection in empirical studies. Results: Our research packages a framework, coined Prompt Refinement and Insights for Mining Empirical Software repositories (PRIMES), consisting of a checklist that can improve LLM usage performance, enhance output quality, and minimize errors through iterative processes and comparisons among different LLMs. We also emphasize the significance of reproducibility by implementing mechanisms for tracking model results. Conclusion: Our findings indicate that standardizing prompt engineering and using PRIMES can enhance the reliability and reproducibility of studies utilizing LLMs. Ultimately, this work calls for further research to address challenges like hallucinations, model biases, and cost-effectiveness in integrating LLMs into workflows.
翻译:背景:大型语言模型(LLM)的出现通过提供创新的软件仓库分析方法,显著改变了软件工程(SE)领域。目标:本研究旨在为未来需要在使用LLM进行软件仓库挖掘研究时优化数据收集与数据集的SE研究者建立一个实用框架。方法:本经验报告基于两项先前的仓库挖掘研究,重点探讨了为提升LLM输出质量(特别是在实证研究数据收集中)所使用的提示词创建、优化与验证方法。结果:我们提出了一个名为“实证软件仓库挖掘的提示词优化与洞察框架”(PRIMES)的集成框架,该框架包含一套检查清单,可通过迭代流程及不同LLM间的结果比较来提升LLM使用效能、增强输出质量并减少错误。我们还通过建立模型结果追踪机制,强调了研究可复现性的重要性。结论:研究表明,标准化提示工程并应用PRIMES框架能够提升基于LLM的研究的可靠性与可复现性。最后,本文呼吁未来研究进一步解决将LLM集成至工作流时面临的幻觉问题、模型偏见及成本效益等挑战。