Neural Radiance Fields (NeRF) enables 3D scene reconstruction from several 2D images but incurs high rendering latency via its point-sampling design. 3D Gaussian Splatting (3DGS) improves on NeRF with explicit scene representation and an optimized pipeline yet still fails to meet practical real-time demands. Existing acceleration works overlook the evolving Tensor Cores of modern GPUs because 3DGS pipeline lacks General Matrix Multiplication (GEMM) operations. This paper proposes GEMM-GS, an acceleration approach utilizing tensor cores on GPUs via GEMM-friendly blending transformation. It equivalently reformulates the 3DGS blending process into a GEMM-compatible form to utilize Tensor Cores. A high-performance CUDA kernel is designed, integrating a three-stage double-buffered pipeline that overlaps computation and memory access. Extensive experiments show that GEMM-GS achieves $1.42\times$ speedup over vanilla 3DGS and provides an additional $1.47\times$ speedup on average when combining with existing acceleration approaches. Code is released at https://github.com/shieldforever/GEMM-GS.
翻译:神经辐射场(NeRF)通过多张二维图像实现三维场景重建,但其点采样设计导致渲染延迟较高。3D高斯溅射(3DGS)通过显式场景表示和优化流程改进了NeRF,但仍无法满足实际实时需求。现有加速方法忽视了现代GPU不断演进的张量核,因为3DGS流程缺乏通用矩阵乘法(GEMM)运算。本文提出GEMM-GS,一种利用GEMM友好型混合变换在GPU张量核上实现加速的方法。该方法将3DGS混合过程等价重构为GEMM兼容形式以利用张量核,并设计了高性能CUDA内核,集成了三级双缓冲流水线以重叠计算与内存访问。大量实验表明,GEMM-GS相比原始3DGS实现$1.42\times$加速,与现有加速方法结合时平均额外获得$1.47\times$加速。代码已发布至https://github.com/shieldforever/GEMM-GS。