Pose-free feed-forward 3D Gaussian Splatting (3DGS) has opened a new frontier for rapid 3D modeling, enabling high-quality Gaussian representations to be generated from uncalibrated multi-view images in a single forward pass. The dominant approach in this space adopts unified monolithic architectures, often built on geometry-centric 3D foundation models, to jointly estimate camera poses and synthesize 3DGS representations within a single network. While architecturally streamlined, such "all-in-one" designs may be suboptimal for high-fidelity 3DGS generation, as they entangle geometric reasoning and appearance modeling within a shared representation. In this work, we introduce 2Xplat, a pose-free feed-forward 3DGS framework based on a two-expert design that explicitly separates geometry estimation from Gaussian generation. A dedicated geometry expert first predicts camera poses, which are then explicitly passed to a powerful appearance expert that synthesizes 3D Gaussians. Despite its conceptual simplicity, being largely underexplored in prior works, the proposed approach proves highly effective. In fewer than 5K training iterations, the proposed two-experts pipeline substantially outperforms prior pose-free feed-forward 3DGS approaches and achieves performance on par with state-of-the-art posed methods. These results challenge the prevailing unified paradigm and suggest the potential advantages of modular design principles for complex 3D geometric estimation and appearance synthesis tasks.
翻译:无位姿前馈式三维高斯泼溅技术为快速三维建模开辟了新领域,能够从未标定的多视角图像中,通过单次前向传播生成高质量的高斯表示。该领域的主流方法采用统一的单体架构(通常基于以几何为中心的三维基础模型),在单一网络中联合估计相机位姿并合成三维高斯泼溅表示。尽管这种"一体化"设计在架构上简洁流畅,但由于其将几何推理与外观建模纠缠在共享表示中,可能不利于生成高保真的三维高斯泼溅表示。本文提出2Xplat——一种基于双专家设计的无位姿前馈式三维高斯泼溅框架,明确将几何估计与高斯生成相分离。专用几何专家首先预测相机位姿,再将位姿显式传递给强大的外观专家,由其合成三维高斯体。尽管该方案概念简单且在先前工作中鲜有探索,但其有效性极高。在少于5000次训练迭代中,所提出的双专家流水线显著优于以往无位姿前馈式三维高斯泼溅方法,并达到了与最优有位姿方法相媲美的性能。这些结果挑战了当前主流的一体化范式,揭示了模块化设计原则在复杂三维几何估计与外观合成任务中的潜在优势。