Software Engineering (SE) research involving the use of Large Language Models (LLMs) has introduced several new challenges related to rigour in benchmarking, contamination, replicability, and sustainability. In this paper, we invite the research community to reflect on how these challenges are addressed in SE. Our results provide a structured overview of current LLM-based SE research at ICSE, highlighting both encouraging practices and persistent shortcomings. We conclude with recommendations to strengthen benchmarking rigour, improve replicability, and address the financial and environmental costs of LLM-based SE.
翻译:涉及大型语言模型(LLMs)的软件工程(SE)研究在基准测试严谨性、数据污染、可复现性及可持续性方面引入了若干新挑战。本文邀请研究界共同反思这些挑战在软件工程领域中的应对现状。我们的研究结果系统梳理了当前ICSE会议上基于LLM的软件工程研究,既揭示了值得推广的实践,也指出了持续存在的不足。最后,我们提出强化基准测试严谨性、提升可复现性、以及应对基于LLM的软件工程研究所产生的财务与环境成本的具体建议。