Digitising the 3D world into a clean, CAD model-based representation has important applications for augmented reality and robotics. Current state-of-the-art methods are computationally intensive as they individually encode each detected object and optimise CAD alignments in a second stage. In this work, we propose FastCAD, a real-time method that simultaneously retrieves and aligns CAD models for all objects in a given scene. In contrast to previous works, we directly predict alignment parameters and shape embeddings. We achieve high-quality shape retrievals by learning CAD embeddings in a contrastive learning framework and distilling those into FastCAD. Our single-stage method accelerates the inference time by a factor of 50 compared to other methods operating on RGB-D scans while outperforming them on the challenging Scan2CAD alignment benchmark. Further, our approach collaborates seamlessly with online 3D reconstruction techniques. This enables the real-time generation of precise CAD model-based reconstructions from videos at 10 FPS. Doing so, we significantly improve the Scan2CAD alignment accuracy in the video setting from 43.0% to 48.2% and the reconstruction accuracy from 22.9% to 29.6%.
翻译:将三维世界数字化为干净的CAD模型表示,对增强现实和机器人技术具有重要应用。当前最先进方法计算开销较大,因为它们需单独编码每个检测到的对象,并在第二阶段优化CAD对齐。本文提出FastCAD——一种实时方法,可同时检索并对齐给定场景中所有物体的CAD模型。与先前工作不同,我们直接预测对齐参数和形状嵌入。通过对比学习框架学习CAD嵌入,并将其蒸馏至FastCAD中,我们实现了高质量形状检索。与基于RGB-D扫描的其他方法相比,我们的单阶段方法推理速度提升50倍,同时在挑战性Scan2CAD对齐基准上表现更优。此外,我们的方法可无缝协同在线三维重建技术,从视频中以10 FPS实时生成精确的CAD模型重建。由此,在视频场景下,我们将Scan2CAD对齐准确率从43.0%显著提升至48.2%,重建准确率从22.9%提升至29.6%。