Feed-forward 3D Gaussian Splatting (3DGS) has emerged as a highly effective solution for novel view synthesis. Existing methods predominantly rely on a \emph{pixel-aligned} Gaussian prediction paradigm, where each 2D pixel is mapped to a 3D Gaussian. We rethink this widely adopted formulation and identify several inherent limitations: it renders the reconstructed 3D models heavily dependent on the number of input views, leads to view-biased density distributions, and introduces alignment errors, particularly when source views contain occlusions or low texture. To address these challenges, we introduce VolSplat, a new multi-view feed-forward paradigm that replaces pixel alignment with voxel-aligned Gaussians. By directly predicting Gaussians from a predicted 3D voxel grid, it overcomes pixel alignment's reliance on error-prone 2D feature matching, ensuring robust multi-view consistency. Furthermore, it enables adaptive control over density based on 3D scene complexity, yielding more faithful Gaussians, improved geometric consistency, and enhanced novel-view rendering quality. Experiments on widely used benchmarks demonstrate that VolSplat achieves state-of-the-art performance, while producing more plausible and view-consistent results. The video results, code and trained models are available on our project page: https://lhmd.top/volsplat.
翻译:前馈式三维高斯溅射(3DGS)已成为新视角合成领域的高效解决方案。现有方法主要依赖一种**像素对齐**的高斯预测范式,即将每个二维像素映射至一个三维高斯。我们重新审视了这一被广泛采用的范式,并识别出其若干固有局限:它导致重建的三维模型严重依赖输入视角数量,引发视角偏置的密度分布,并在源视角存在遮挡或低纹理区域时引入对齐误差。为应对这些挑战,我们提出了VolSplat——一种新的多视角前馈范式,以体素对齐的高斯函数取代像素对齐机制。该方法通过从预测的三维体素网格直接生成高斯函数,克服了像素对齐对易出错的二维特征匹配的依赖,从而确保稳健的多视角一致性。此外,该方法支持基于三维场景复杂度的自适应密度调控,能够生成更忠实的高斯分布、提升几何一致性并增强新视角渲染质量。在广泛使用的基准测试上的实验表明,VolSplat实现了最先进的性能,同时生成更具合理性与视角一致性的结果。视频结果、代码及训练模型已发布于项目页面:https://lhmd.top/volsplat。