We present SyncFix, a framework that enforces cross-view consistency during the diffusion-based refinement of reconstructed scenes. SyncFix formulates refinement as a joint latent bridge matching problem, synchronizing distorted and clean representations across multiple views to fix the semantic and geometric inconsistencies. This means SyncFix learns a joint conditional over multiple views to enforce consistency throughout the denoising trajectory. Our training is done only on image pairs, but it generalizes naturally to an arbitrary number of views during inference. Moreover, reconstruction quality improves with additional views, with diminishing returns at higher view counts. Qualitative and quantitative results demonstrate that SyncFix consistently generates high-quality reconstructions and surpasses current state-of-the-art baselines, even in the absence of clean reference images. SyncFix achieves even higher fidelity when sparse references are available.
翻译:我们提出SyncFix框架,该框架在基于扩散模型的重建场景细化过程中强制执行跨视图一致性。SyncFix将细化问题形式化为联合隐空间桥接匹配问题,通过同步多视图间的畸变与干净表示来修复语义与几何不一致性。这意味着SyncFix学习跨多个视图的联合条件概率,以在整个去噪轨迹中强制执行一致性。我们的训练仅基于图像对,但在推理阶段可自然泛化至任意数量视图。此外,重建质量随视图数量增加而提升,但高视图数下的收益递减。定性与定量结果表明,即便在无干净参考图像的情况下,SyncFix仍能持续生成高质量重建结果并超越当前最先进基线。当存在稀疏参考图像时,SyncFix可实现更高保真度。