Closed-loop simulation is a core component of autonomous vehicle (AV) development, enabling scalable testing, training, and safety validation before real-world deployment. Neural scene reconstruction converts driving logs into interactive 3D environments for simulation, but it does not produce complete 3D object assets required for agent manipulation and large-viewpoint novel-view synthesis. To address this challenge, we present Asset Harvester, an image-to-3D model and end-to-end pipeline that converts sparse, in-the-wild object observations from real driving logs into complete, simulation-ready assets. Rather than relying on a single model component, we developed a system-level design for real-world AV data that combines large-scale curation of object-centric training tuples, geometry-aware preprocessing across heterogeneous sensors, and a robust training recipe that couples sparse-view-conditioned multiview generation with 3D Gaussian lifting. Within this system, SparseViewDiT is explicitly designed to address limited-angle views and other real-world data challenges. Together with hybrid data curation, augmentation, and self-distillation, this system enables scalable conversion of sparse AV object observations into reusable 3D assets.
翻译:闭环仿真是自动驾驶车辆开发的核心组件,可在实际部署前实现可扩展测试、训练及安全验证。神经场景重建将驾驶日志转换为用于仿真的交互式3D环境,但无法生成智能体操控和大视角新视角合成所需的完整3D物体资产。为解决这一挑战,我们提出资产收割机,一种从图像到3D模型的端到端流水线,可将真实驾驶日志中稀疏、非受控的物体观测转换为可直接用于仿真的完整资产。我们并未依赖单一模型组件,而是针对真实自动驾驶数据设计了系统级架构,结合大规模物体中心训练元组筛选、异构传感器几何感知预处理,以及将稀疏视图条件多视图生成与3D高斯提升结合的鲁棒训练策略。在该系统中,稀疏视图扩散Transformer被专门设计用于解决有限视角观测及其他真实数据挑战。通过混合数据筛选、增强与自蒸馏技术,该系统实现了从稀疏自动驾驶物体观测到可复用3D资产的可扩展转换。