DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image

Perceiving 3D structures from RGB images based on CAD model primitives can enable an effective, efficient 3D object-based representation of scenes. However, current approaches rely on supervision from expensive annotations of CAD models associated with real images, and encounter challenges due to the inherent ambiguities in the task -- both in depth-scale ambiguity in monocular perception, as well as inexact matches of CAD database models to real observations. We thus propose DiffCAD, the first weakly-supervised probabilistic approach to CAD retrieval and alignment from an RGB image. We formulate this as a conditional generative task, leveraging diffusion to learn implicit probabilistic models capturing the shape, pose, and scale of CAD objects in an image. This enables multi-hypothesis generation of different plausible CAD reconstructions, requiring only a few hypotheses to characterize ambiguities in depth/scale and inexact shape matches. Our approach is trained only on synthetic data, leveraging monocular depth and mask estimates to enable robust zero-shot adaptation to various real target domains. Despite being trained solely on synthetic data, our multi-hypothesis approach can even surpass the supervised state-of-the-art on the Scan2CAD dataset by 5.9% with 8 hypotheses.

翻译：基于CAD模型基元从RGB图像中感知三维结构，能够实现场景的有效、高效三维对象化表征。然而，现有方法依赖于真实图像关联CAD模型的高成本标注监督，且面临任务固有模糊性带来的挑战——包括单目感知中的深度-尺度模糊性，以及CAD数据库模型与真实观测间的不精确匹配。为此，我们提出DiffCAD，首个从RGB图像进行CAD检索与对齐的弱监督概率方法。我们将该任务构建为条件生成问题，利用扩散模型学习捕获图像中CAD对象形状、姿态和尺度的隐式概率模型。该方法支持多假设生成，仅需少量假设即可表征深度/尺度模糊性与不精确形状匹配的多种合理CAD重建结果。我们的方法仅使用合成数据进行训练，通过单目深度与掩码估计实现鲁棒的零样本适应，可迁移至多种真实目标域。尽管完全基于合成数据训练，我们在Scan2CAD数据集上采用8个假设的多假设方法甚至以5.9%的优势超越了全监督的最先进方法。

相关内容

CAD

关注 3

《计算机辅助设计》是一份领先的国际期刊，为学术界和工业界提供有关计算机应用于设计的研究和发展的重要论文。计算机辅助设计邀请论文报告新的研究以及新颖或特别重要的应用，在广泛的主题中，跨越所有阶段的设计过程，从概念创造到制造超越。官网地址：http://dblp.uni-trier.de/db/journals/cad/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日