3D Gaussian splatting (3DGS) is a transformative technique with profound implications on novel view synthesis and real-time rendering. Given its importance, there have been many attempts to improve its performance. However, with the increasing complexity of GPU architectures and the vast search space of performance-tuning parameters, it is a challenging task. Although manual optimizations have achieved remarkable speedups, they require domain expertise and the optimization process can be highly time consuming and error prone. In this paper, we propose to exploit large language models (LLMs) to analyze and optimize Gaussian splatting kernels. To our knowledge, this is the first work to use LLMs to optimize highly specialized real-world GPU kernels. We reveal the intricacies of using LLMs for code optimization and analyze the code optimization techniques from the LLMs. We also propose ways to collaborate with LLMs to further leverage their capabilities. For the original 3DGS code on the MipNeRF360 datasets, LLMs achieve significant speedups, 19% with Deepseek and 24% with GPT-5, demonstrating the different capabilities of different LLMs. By feeding additional information from performance profilers, the performance improvement from LLM-optimized code is enhanced to up to 42% and 38% on average. In comparison, our best-effort manually optimized version can achieve a performance improvement up to 48% and 39% on average, showing that there are still optimizations beyond the capabilities of current LLMs. On the other hand, even upon a newly proposed 3DGS framework with algorithmic optimizations, Seele, LLMs can still further enhance its performance by 6%, showing that there are optimization opportunities missed by domain experts. This highlights the potential of collaboration between domain experts and LLMs.
翻译:三维高斯溅射(3DGS)是一项对新颖视角合成与实时渲染具有深远影响的变革性技术。鉴于其重要性,已有诸多尝试致力于提升其性能。然而,随着GPU架构日益复杂以及性能调优参数搜索空间庞大,这成为一项极具挑战性的任务。尽管手动优化已取得显著的加速效果,但其需要领域专业知识,且优化过程可能极为耗时且容易出错。本文提出利用大语言模型(LLMs)来分析与优化高斯溅射核函数。据我们所知,这是首个利用LLMs优化高度专业化、真实世界GPU核函数的工作。我们揭示了使用LLMs进行代码优化的复杂性,并分析了LLMs所采用的代码优化技术。我们还提出了与LLMs协作以进一步发挥其能力的方法。对于MipNeRF360数据集上的原始3DGS代码,LLMs实现了显著的加速:Deepseek模型达到19%,GPT-5模型达到24%,这展示了不同LLM能力的差异。通过引入性能分析器提供的额外信息,LLM优化代码的性能提升最高可达42%,平均提升38%。相比之下,我们尽力而为的手动优化版本最高可实现48%的性能提升,平均提升39%,这表明仍存在超出当前LLMs能力范围的优化空间。另一方面,即使在一个新近提出的、已包含算法优化的3DGS框架(Seele)上,LLMs仍能将其性能进一步提升6%,这表明领域专家也可能遗漏某些优化机会。这凸显了领域专家与LLMs之间协作的潜力。