Automatically generated software, especially code produced by Large Language Models (LLMs), is increasingly adopted to accelerate development and reduce manual effort. However, little is known about the long-term reliability of such systems under sustained execution. In this paper, we experimentally investigate the phenomenon of software aging in applications generated by LLM-based tools. Using the Bolt platform and standardized prompts from Baxbench, we generated four service-oriented applications and subjected them to 50-hour load tests. Resource usage, response time, and throughput were continuously monitored to detect degradation patterns. The results reveal significant evidence of software aging, including progressive memory growth, increased response time, and performance instability across all applications. Statistical analyzes confirm these trends and highlight variability in the severity of aging according to the type of application. Our findings show the need to consider aging in automatically generated software and provide a foundation for future studies on mitigation strategies and long-term reliability evaluation.
翻译:自动生成的软件,特别是由大型语言模型(LLMs)生成的代码,正被越来越多地采用以加速开发并减少人工投入。然而,对于此类系统在持续执行下的长期可靠性,目前知之甚少。本文通过实验研究了基于LLM工具生成的应用程序中的软件老化现象。利用Bolt平台和来自Baxbench的标准化提示,我们生成了四个面向服务的应用程序,并对它们进行了50小时的负载测试。持续监测资源使用、响应时间和吞吐量,以检测性能退化模式。结果揭示了显著的软件老化证据,包括渐进式内存增长、响应时间增加以及所有应用程序中的性能不稳定。统计分析证实了这些趋势,并突显了老化严重程度随应用程序类型而变化的差异性。我们的研究结果表明,有必要在自动生成的软件中考虑老化问题,并为未来关于缓解策略和长期可靠性评估的研究奠定了基础。