Feature extraction, matching, structure from motion (SfM), and novel view synthesis (NVS) have traditionally been treated as separate problems with independent optimization objectives. We present GloSplat, a framework that performs \emph{joint pose-appearance optimization} during 3D Gaussian Splatting training. Unlike prior joint optimization methods (BARF, NeRF--, 3RGS) that rely purely on photometric gradients for pose refinement, GloSplat preserves \emph{explicit SfM feature tracks} as first-class entities throughout training: track 3D points are maintained as separate optimizable parameters from Gaussian primitives, providing persistent geometric anchors via a reprojection loss that operates alongside photometric supervision. This architectural choice prevents early-stage pose drift while enabling fine-grained refinement -- a capability absent in photometric-only approaches. We introduce two pipeline variants: (1) \textbf{GloSplat-F}, a COLMAP-free variant using retrieval-based pair selection for efficient reconstruction, and (2) \textbf{GloSplat-A}, an exhaustive matching variant for maximum quality. Both employ global SfM initialization followed by joint photometric-geometric optimization during 3DGS training. Experiments demonstrate that GloSplat-F achieves state-of-the-art among COLMAP-free methods while GloSplat-A surpasses all COLMAP-based baselines.
翻译:特征提取、匹配、运动恢复结构(SfM)和新视角合成(NVS)传统上被视为具有独立优化目标的不同问题。本文提出GloSplat,一个在3D高斯泼溅训练期间执行**联合位姿-外观优化**的框架。与先前完全依赖光度梯度进行位姿优化的联合优化方法(BARF、NeRF--、3RGS)不同,GloSplat在整个训练过程中将**显式SfM特征轨迹**作为一等实体进行维护:轨迹三维点作为独立于高斯基元的可优化参数持续存在,通过重投影损失(与光度监督并行运作)提供持久的几何锚点。这种架构设计防止了早期位姿漂移,同时实现了细粒度优化——这是纯光度方法所不具备的能力。我们提出了两种流程变体:(1)**GloSplat-F**,一种无需COLMAP的变体,采用基于检索的图像对选择以实现高效重建;(2)**GloSplat-A**,一种采用穷举匹配的变体以实现最高质量。两者均采用全局SfM初始化,随后在3DGS训练期间进行联合光度-几何优化。实验表明,GloSplat-F在无需COLMAP的方法中达到最优性能,而GloSplat-A超越了所有基于COLMAP的基线方法。