DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image

Perceiving 3D structures from RGB images based on CAD model primitives can enable an effective, efficient 3D object-based representation of scenes. However, current approaches rely on supervision from expensive annotations of CAD models associated with real images, and encounter challenges due to the inherent ambiguities in the task -- both in depth-scale ambiguity in monocular perception, as well as inexact matches of CAD database models to real observations. We thus propose DiffCAD, the first weakly-supervised probabilistic approach to CAD retrieval and alignment from an RGB image. We formulate this as a conditional generative task, leveraging diffusion to learn implicit probabilistic models capturing the shape, pose, and scale of CAD objects in an image. This enables multi-hypothesis generation of different plausible CAD reconstructions, requiring only a few hypotheses to characterize ambiguities in depth/scale and inexact shape matches. Our approach is trained only on synthetic data, leveraging monocular depth and mask estimates to enable robust zero-shot adaptation to various real target domains. Despite being trained solely on synthetic data, our multi-hypothesis approach can even surpass the supervised state-of-the-art on the Scan2CAD dataset by 5.9% with 8 hypotheses.

翻译：摘要：基于CAD模型基元从RGB图像中感知三维结构，能够实现对场景的高效、有效的三维物体表征。然而，现有方法依赖于与真实图像关联的昂贵CAD模型标注进行监督，并且由于任务固有的歧义性（包括单目感知中的深度-尺度歧义，以及CAD数据库模型与真实观测之间的不精确匹配）而面临挑战。为此，我们提出DiffCAD，这是首个基于RGB图像进行CAD检索与对齐的弱监督概率性方法。我们将此问题形式化为条件生成任务，利用扩散模型学习隐式概率分布，以捕捉图像中CAD物体的形状、姿态和尺度。这能够生成多个合理的CAD重建假设，仅需少量假设即可表征深度/尺度歧义和不精确的形状匹配。我们的方法仅使用合成数据进行训练，并利用单目深度和掩膜估计，实现了对多种真实目标域鲁棒的零样本自适应。尽管仅使用合成数据训练，我们的多假设方法在Scan2CAD数据集上以8个假设即可超越有监督的现有技术5.9%。

相关内容

CAD

关注 3

《计算机辅助设计》是一份领先的国际期刊，为学术界和工业界提供有关计算机应用于设计的研究和发展的重要论文。计算机辅助设计邀请论文报告新的研究以及新颖或特别重要的应用，在广泛的主题中，跨越所有阶段的设计过程，从概念创造到制造超越。官网地址：http://dblp.uni-trier.de/db/journals/cad/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日