3D Gaussian Splatting (3DGS) has significantly advanced real-time novel view synthesis by representing scenes as dense collections of anisotropic 3D Gaussian primitives. However, the irregular spatial distribution of Gaussians often leads to poor GPU utilization, as warp divergence and redundant computation degrade rendering performance. To address this, we present Local-GS, a warp-coherent rendering paradigm that, organizes Gaussian primitives with respect to SIMT (Single Instruction, Multiple Threads) execution boundaries rather than scene geometry. Specifically, we propose three warp-coherent stages: a hoisting stage that precomputes shared parameters at tile level, a culling stage that discards warps with no contribution, and a blending stage that replaces per-pixel branching with a uniform instruction stream. Across extensive benchmarks on multiple datasets, Local-GS improves efficiency without compromising quality. As a plug-and-play optimization, it provides additional performance gains to all tested baselines, culminating in a $7.76\times$ speedup on Deep Blending scenes.
翻译:三维高斯泼溅(3DGS)通过将场景表示为各向异性三维高斯原语的密集集合,显著推动了实时新视角合成技术的发展。然而,高斯分布的空间不规则性常导致GPU利用率低下,因线程束分歧和冗余计算降低了渲染性能。为此,我们提出Local-GS——一种线程束相干渲染范式,该范式依据SIMT(单指令多线程)执行边界而非场景几何结构来组织高斯原语。具体而言,我们设计了三个线程束相干阶段:提升阶段在瓦片级别预计算共享参数,剔除阶段丢弃无贡献的线程束,混合阶段则用统一指令流替代逐像素分支。在多个数据集上的广泛基准测试表明,Local-GS在保持质量的同时提升了效率。作为一种即插即用的优化方案,它为所有测试基线提供了额外的性能增益,最终在Deep Blending场景中实现了$7.76\times$的加速比。