Wide-baseline panoramic images are frequently used in applications like VR and simulations to minimize capturing labor costs and storage needs. However, synthesizing novel views from these panoramic images in real time remains a significant challenge, especially due to panoramic imagery's high resolution and inherent distortions. Although existing 3D Gaussian splatting (3DGS) methods can produce photo-realistic views under narrow baselines, they often overfit the training views when dealing with wide-baseline panoramic images due to the difficulty in learning precise geometry from sparse 360$^{\circ}$ views. This paper presents \textit{Splatter-360}, a novel end-to-end generalizable 3DGS framework designed to handle wide-baseline panoramic images. Unlike previous approaches, \textit{Splatter-360} performs multi-view matching directly in the spherical domain by constructing a spherical cost volume through a spherical sweep algorithm, enhancing the network's depth perception and geometry estimation. Additionally, we introduce a 3D-aware bi-projection encoder to mitigate the distortions inherent in panoramic images and integrate cross-view attention to improve feature interactions across multiple viewpoints. This enables robust 3D-aware feature representations and real-time rendering capabilities. Experimental results on the HM3D~\cite{hm3d} and Replica~\cite{replica} demonstrate that \textit{Splatter-360} significantly outperforms state-of-the-art NeRF and 3DGS methods (e.g., PanoGRF, MVSplat, DepthSplat, and HiSplat) in both synthesis quality and generalization performance for wide-baseline panoramic images. Code and trained models are available at \url{https://3d-aigc.github.io/Splatter-360/}.
翻译:宽基线全景图像因其能显著降低采集人力成本与存储需求,在虚拟现实(VR)和仿真等应用中常被使用。然而,从这些全景图像实时合成新视角仍是一个重大挑战,这尤其源于全景图像的高分辨率及其固有的畸变。尽管现有的三维高斯溅射(3DGS)方法能在窄基线条件下生成照片级真实感的视图,但在处理宽基线全景图像时,由于难以从稀疏的360度视图中学习精确几何,这些方法往往对训练视图产生过拟合。本文提出\textit{Splatter-360},一种新颖的端到端通用化3DGS框架,专为处理宽基线全景图像而设计。与先前方法不同,\textit{Splatter-360}通过球面扫描算法构建球面代价体,直接在球面域进行多视图匹配,从而增强了网络的深度感知与几何估计能力。此外,我们引入了一种三维感知的双投影编码器以缓解全景图像固有的畸变,并整合了跨视图注意力机制以改善多视角间的特征交互。这使得系统能够获得鲁棒的三维感知特征表示并具备实时渲染能力。在HM3D~\cite{hm3d}和Replica~\cite{replica}数据集上的实验结果表明,对于宽基线全景图像,\textit{Splatter-360}在合成质量与泛化性能上均显著优于最先进的NeRF和3DGS方法(例如PanoGRF、MVSplat、DepthSplat和HiSplat)。代码与训练模型发布于\url{https://3d-aigc.github.io/Splatter-360/}。