Configuration settings are essential for tailoring software behavior to meet specific performance requirements. However, incorrect configurations are widespread, and identifying those that impact system performance is challenging due to the vast number and complexity of possible settings. In this work, we present PerfSense, a lightweight framework that leverages Large Language Models (LLMs) to efficiently identify performance-sensitive configurations with minimal overhead. PerfSense employs LLM agents to simulate interactions between developers and performance engineers using advanced prompting techniques such as prompt chaining and retrieval-augmented generation (RAG). Our evaluation of seven open-source Java systems demonstrates that PerfSense achieves an average accuracy of 64.77% in classifying performance-sensitive configurations, outperforming both our LLM baseline (50.36%) and the previous state-of-the-art method (61.75%). Notably, our prompt chaining technique improves recall by 10% to 30% while maintaining similar precision levels. Additionally, a manual analysis of 362 misclassifications reveals common issues, including LLMs' misunderstandings of requirements (26.8%). In summary, PerfSense significantly reduces manual effort in classifying performance-sensitive configurations and offers valuable insights for future LLM-based code analysis research.
翻译:配置设置对于定制软件行为以满足特定性能需求至关重要。然而,由于可能设置的庞大规模和复杂性,错误配置普遍存在,识别那些影响系统性能的配置具有挑战性。本文提出PerfSense,一种轻量级框架,利用大型语言模型(LLMs)以最小开销高效识别性能敏感配置。PerfSense采用LLM代理,通过提示链和检索增强生成(RAG)等先进提示技术,模拟开发人员与性能工程师之间的交互。我们对七个开源Java系统的评估表明,PerfSense在分类性能敏感配置时达到64.77%的平均准确率,优于我们的LLM基线(50.36%)和先前最先进方法(61.75%)。值得注意的是,我们的提示链技术在保持相似精确度的同时,将召回率提高了10%至30%。此外,对362个误分类案例的手动分析揭示了常见问题,包括LLM对需求的理解偏差(26.8%)。总之,PerfSense显著减少了分类性能敏感配置所需的人工工作量,并为未来基于LLM的代码分析研究提供了宝贵见解。