Reconstructing 3D representations from 2D inputs is a fundamental task in computer vision and graphics, serving as a cornerstone for understanding and interacting with the physical world. While traditional methods achieve high fidelity, they are limited by slow per-scene optimization or category-specific training, which hinders their practical deployment and scalability. Hence, generalizable feed-forward 3D reconstruction has witnessed rapid development in recent years. By learning a model that maps images directly to 3D representations in a single forward pass, these methods enable efficient reconstruction and robust cross-scene generalization. Our survey is motivated by a critical observation: despite the diverse geometric output representations, ranging from implicit fields to explicit primitives, existing feed-forward approaches share similar high-level architectural patterns, such as image feature extraction backbones, multi-view information fusion mechanisms, and geometry-aware design principles. Consequently, we abstract away from these representation differences and instead focus on model design, proposing a novel taxonomy centered on model design strategies that are agnostic to the output format. Our proposed taxonomy organizes the research directions into five key problems that drive recent research development: feature enhancement, geometry awareness, model efficiency, augmentation strategies and temporal-aware models. To support this taxonomy with empirical grounding and standardized evaluation, we further comprehensively review related benchmarks and datasets, and extensively discuss and categorize real-world applications based on feed-forward 3D models. Finally, we outline future directions to address open challenges such as scalability, evaluation standards, and world modeling.
翻译:从二维输入重建三维表示是计算机视觉与图形学中的基础任务,为理解和交互物理世界提供了核心支撑。传统方法虽能实现高保真度,但受限于逐场景优化速度慢或类别专用训练,阻碍了其实际部署与可扩展性。因此,近年来具有泛化能力的前馈式三维重建技术得到迅速发展。通过训练模型在单次前向传播中直接将图像映射为三维表示,这类方法实现了高效重建与鲁棒的跨场景泛化。本综述基于一个关键观察展开:尽管现有的前馈方法在几何输出表示上形态各异(从隐式场到显式基元),但其高层架构设计存在共性模式,例如图像特征提取主干网络、多视图信息融合机制以及几何感知设计原则。因此,我们跳出表示形式的差异,聚焦于模型设计本身,提出一种新型分类体系——该体系以与输出格式无关的模型设计策略为核心。该分类体系将研究方向归纳为驱动近期发展的五大关键问题:特征增强、几何感知、模型效率、增强策略及时序感知模型。为给本分类体系提供实验支撑与标准化评估,我们进一步系统梳理了相关基准与数据集,并基于前馈式三维模型对实际应用进行深入讨论与分类。最后,我们展望了解决开放挑战(如可扩展性、评估标准与世界建模)的未来方向。