Software Engineering (SE) research involving the use of Large Language Models (LLMs) has introduced several new challenges related to rigour in benchmarking, contamination, replicability, and sustainability. In this paper, we invite the research community to reflect on how these challenges are addressed in SE. Our results provide a structured overview of current LLM-based SE research at ICSE, highlighting both encouraging practices and persistent shortcomings. We conclude with recommendations to strengthen benchmarking rigour, improve replicability, and address the financial and environmental costs of LLM-based SE.
翻译:涉及大型语言模型(LLMs)应用的软件工程(SE)研究在基准测试严谨性、数据污染、可复现性及可持续性等方面带来了新的挑战。本文旨在引导研究界深入思考SE领域应对这些挑战的现状。通过对ICSE会议现有LLM相关研究的系统梳理,我们既揭示了值得推广的实践模式,也指出了亟待改进的缺陷。最后,我们提出系列建议以增强基准测试的严谨性、提升研究可复现性,并应对LLM驱动型SE研究产生的经济与环境成本。