While large language models (LLMs) are increasingly used for generating parallel scientific codes, most efforts emphasize functional correctness, often overlooking performance, especially energy efficiency. We propose LASSI-EE, an automated LLM-based refactoring framework that generates energy-efficient parallel codes through a multi-stage, iterative approach integrating runtime power profiling, energy-aware prompting, self-correcting feedback loops, and an LLM-as-a-Judge agent for automated screening of code solutions. We introduce energy-reduction@k, a novel metric that quantifies expected energy reduction when generating k code candidates and selecting the most energy-efficient, enabling systematic evaluation of multi-attempt generation strategies. Evaluating 20 HeCBench applications and two miniApps on NVIDIA A100 and AMD MI100 GPUs, a single run (k=1) with LASSI-EE delivers refactored parallel codes with an average 29% expected energy reduction at an 81% pass rate, representing a 2.8x improvement over vanilla LLM prompting. Multiple runs (k=3) achieve an average 48% expected energy reduction at a 97% pass rate. These results are consistent across devices, demonstrating LASSI-EE's effectiveness across diverse hardware architectures.
翻译:尽管大语言模型(LLMs)在生成并行科学代码方面的应用日益增多,但多数研究侧重于功能正确性,往往忽视了性能,尤其是能效问题。我们提出了LASSI-EE,一种基于LLM的自动化重构框架,通过集成运行时功耗分析、能耗感知提示、自校正反馈循环以及作为评判代理的LLM来自动筛选代码解决方案的多阶段迭代方法,生成高能效的并行代码。我们引入了energy-reduction@k这一新指标,用于量化在生成k个代码候选方案并选择最具能效的方案时的预期能耗降低,从而实现对多尝试生成策略的系统性评估。通过在NVIDIA A100和AMD MI100 GPU上评估20个HeCBench应用程序和两个miniApps,LASSI-EE的单次运行(k=1)生成了平均预期能耗降低29%的重构并行代码,通过率为81%,相比原始LLM提示方法提升了2.8倍。多次运行(k=3)实现了平均预期能耗降低48%,通过率达到97%。这些结果在不同设备上保持一致,证明了LASSI-EE在多种硬件架构上的有效性。