Sparse-view 3D reconstruction is essential for modeling scenes from casual captures, but remain challenging for non-generative reconstruction. Existing diffusion-based approaches mitigates this issues by synthesizing novel views, but they often condition on only one or two capture frames, which restricts geometric consistency and limits scalability to large or diverse scenes. We propose AnyRecon, a scalable framework for reconstruction from arbitrary and unordered sparse inputs that preserves explicit geometric control while supporting flexible conditioning cardinality. To support long-range conditioning, our method constructs a persistent global scene memory via a prepended capture view cache, and removes temporal compression to maintain frame-level correspondence under large viewpoint changes. Beyond better generative model, we also find that the interplay between generation and reconstruction is crucial for large-scale 3D scenes. Thus, we introduce a geometry-aware conditioning strategy that couples generation and reconstruction through an explicit 3D geometric memory and geometry-driven capture-view retrieval. To ensure efficiency, we combine 4-step diffusion distillation with context-window sparse attention to reduce quadratic complexity. Extensive experiments demonstrate robust and scalable reconstruction across irregular inputs, large viewpoint gaps, and long trajectories.
翻译:稀疏视角三维重建对于从随意拍摄的场景进行建模至关重要,但在非生成式重建中仍具挑战性。现有基于扩散的方法通过合成新视角缓解了这一问题,但通常仅以一到两个捕获帧为条件,这限制了几何一致性并难以扩展至大规模或多样化场景。我们提出AnyRecon——一个可扩展的框架,支持从任意无序稀疏输入进行重建,既保留显式几何控制,又支持灵活的条件基数。为支持长程条件化,我们的方法通过预置捕获视角缓存构建持久的全局场景记忆,并消除时间压缩以保持大视角变化下的帧级对应关系。除了更优的生成模型,我们还发现生成与重建的相互作用对大规模三维场景至关重要。为此,我们引入几何感知条件化策略,通过显式三维几何记忆和几何驱动的捕获视角检索耦合生成与重建。为确保效率,我们结合四步扩散蒸馏与上下文窗口稀疏注意力以降低二次复杂度。大量实验证明,该方法在不规则输入、大视角间隙和长轨迹场景下均具有鲁棒且可扩展的重建能力。