Creating high-quality view synthesis is essential for immersive applications but continues to be problematic, particularly in indoor environments and for real-time deployment. Current techniques frequently require extensive computational time for both training and rendering, and often produce less-than-ideal 3D representations due to inadequate geometric structuring. To overcome this, we introduce VoxNeRF, a novel approach that leverages volumetric representations to enhance the quality and efficiency of indoor view synthesis. Firstly, VoxNeRF constructs a structured scene geometry and converts it into a voxel-based representation. We employ multi-resolution hash grids to adaptively capture spatial features, effectively managing occlusions and the intricate geometry of indoor scenes. Secondly, we propose a unique voxel-guided efficient sampling technique. This innovation selectively focuses computational resources on the most relevant portions of ray segments, substantially reducing optimization time. We validate our approach against three public indoor datasets and demonstrate that VoxNeRF outperforms state-of-the-art methods. Remarkably, it achieves these gains while reducing both training and rendering times, surpassing even Instant-NGP in speed and bringing the technology closer to real-time.
翻译:生成高质量视角合成对沉浸式应用至关重要,但在室内环境及实时部署中仍存在问题。当前技术通常需要大量计算时间进行训练和渲染,且因几何结构不够完善常产生次优的三维表示。为解决此问题,我们提出VoxNeRF——一种利用体素表示提升室内视角合成质量与效率的新方法。首先,VoxNeRF构建结构化场景几何,并将其转换为基于体素的表示。我们采用多分辨率哈希网格自适应捕获空间特征,有效处理室内场景中的遮挡与复杂几何。其次,我们提出一种独特的体素引导高效采样技术。该创新方法将计算资源选择性集中于射线段中最相关的部分,大幅缩短优化时间。我们在三个公开室内数据集上验证了该方法,证明VoxNeRF性能优于现有最先进方法。值得注意的是,它在缩短训练与渲染时间的同时实现性能提升,速度甚至超越Instant-NGP,使该技术向实时应用更进一步。