Neural Radiance Fields (NeRF) enables 3D scene reconstruction from several 2D images but incurs high rendering latency via its point-sampling design. 3D Gaussian Splatting (3DGS) improves on NeRF with explicit scene representation and an optimized pipeline yet still fails to meet practical real-time demands. Existing acceleration works overlook the evolving Tensor Cores of modern GPUs because 3DGS pipeline lacks General Matrix Multiplication (GEMM) operations. This paper proposes GEMM-GS, an acceleration approach utilizing tensor cores on GPUs via GEMM-friendly blending transformation. It equivalently reformulates the 3DGS blending process into a GEMM-compatible form to utilize Tensor Cores. A high-performance CUDA kernel is designed, integrating a three-stage double-buffered pipeline that overlaps computation and memory access. Extensive experiments show that GEMM-GS achieves $1.42\times$ speedup over vanilla 3DGS and provides an additional $1.47\times$ speedup on average when combining with existing acceleration approaches. Code is released at https://github.com/shieldforever/GEMM-GS.
翻译:神经辐射场(NeRF)能够从多张二维图像重建三维场景,但其点采样机制导致渲染延迟较高。3D高斯泼溅(3DGS)通过显式场景表示与优化管线改进NeRF,仍难以满足实际实时需求。现有加速工作忽视现代GPU不断演进的张量核心,因为3DGS管线缺乏通用矩阵乘法(GEMM)运算。本文提出GEMM-GS,一种通过GEMM友好型融合变换利用GPU张量核心的加速方法。该方法将3DGS融合过程等价重构为GEMM兼容形式以利用张量核心,并设计高性能CUDA内核,集成三阶段双缓冲流水线以重叠计算与内存访问。大量实验表明,GEMM-GS相较原始3DGS实现1.42倍加速,而与现有加速方法结合时平均额外获得1.47倍加速。代码已开源发布于https://github.com/shieldforever/GEMM-GS。