Performance profiles of GPU kernels generated by tools such as Nsight Compute are rich in detail but are often challenging to interpret. To achieve the best performance possible on a given GPU architecture, kernel developers need to spend significant time analyzing and comparing profiles in the tool's graphical interface to identify and understand kernel performance bottlenecks. Large Language Models (LLMs) have shown promise in understanding complex data and generating natural language explanations. In this paper, we propose the Kernel Execution Explanation Toolkit (KEET), an LLM-based agentic framework for interpreting Nsight Compute profiles to generate useful and data-grounded natural language explanations of performance issues in GPU kernels, and suggestions for optimizations. We evaluate \toolname using several CUDA kernels of varying complexity on NVIDIA H100 GPUs. We find that the generated explanations, when provided as context, improve the quality of LLM code optimization and multiple-choice question answering in downstream tasks. We further demonstrate that the tool can be used to interpret performance data from large sets of profiles to improve the quality of optimization suggestions.
翻译:Nsight Compute等工具生成的GPU内核性能分析报告虽然细节丰富,但通常难以解读。要在特定GPU架构上实现最佳性能,内核开发者需要花费大量时间在工具的图形界面中分析和比较性能数据,以识别和理解内核性能瓶颈。大型语言模型(LLM)在理解复杂数据和生成自然语言解释方面已展现出潜力。本文提出内核执行解释工具包KEET,这是一个基于LLM的智能体框架,用于解读Nsight Compute性能分析报告,生成关于GPU内核性能问题的实用且基于数据的自然语言解释,并提供优化建议。我们在NVIDIA H100 GPU上使用多个复杂度各异的CUDA内核评估了该工具。研究发现,生成的解释作为上下文信息,能提升下游任务中LLM代码优化和多选题回答的质量。我们进一步证明,该工具可通过解读大规模性能分析数据集来提升优化建议的质量。