The increasing use of language models in automated software testing raises concerns about their environmental impact, yet existing sustainability analyses focus almost exclusively on large language models. As a result, the energy and carbon characteristics of small language models (SLMs) during test generation remain largely unexplored. To address this gap, this work introduces the DeCEAT framework, which systematically evaluates the environmental and performance trade-offs of SLMs using the HumanEval benchmark and adaptive prompt variants (based on the Anthropic template). The framework quantifies emission and time-aware behavior under controlled conditions, with CodeCarbon measuring energy consumption and carbon emissions, and unit test coverage assessing the quality of generated tests. Our results show that different SLMs exhibit distinct sustainability strengths: some prioritize lower energy use and faster execution, while others maintain higher stability or accuracy under carbon constraints. These findings demonstrate that sustainability in the generation of SLM-driven tests is multidimensional and strongly shaped by prompt design. This work provides a focused sustainability evaluation framework specifically tailored to automated SLM-based test generation, clarifying how prompt structure and model choice jointly influence environmental and performance outcomes.
翻译:语言模型在自动化软件测试中的日益广泛应用引发了对其环境影响的担忧,然而现有的可持续性分析几乎完全集中于大型语言模型。因此,小型语言模型在测试生成过程中的能耗与碳排放特性在很大程度上仍未得到探索。为填补这一空白,本研究提出了DeCEAT框架,该框架利用HumanEval基准测试和自适应提示变体(基于Anthropic模板),系统性地评估了小型语言模型在环境效益与性能之间的权衡。该框架在受控条件下量化了排放与时间感知行为,其中CodeCarbon用于测量能耗与碳排放,单元测试覆盖率则用于评估生成测试的质量。我们的结果表明,不同的小型语言模型展现出各异的可持续性优势:一些模型优先考虑更低的能耗和更快的执行速度,而另一些模型则在碳排放约束下保持更高的稳定性或准确性。这些发现表明,小型语言模型驱动测试生成的可持续性是多维度的,并且深受提示设计的影响。本研究提供了一个专门针对基于小型语言模型的自动化测试生成而定制的可持续性评估框架,阐明了提示结构和模型选择如何共同影响环境与性能结果。