Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization

Reverse engineering 3D computer-aided design (CAD) models from images is an important task for many downstream applications including interactive editing, manufacturing, architecture, robotics, etc. The difficulty of the task lies in vast representational disparities between the CAD output and the image input. CAD models are precise, programmatic constructs that involves sequential operations combining discrete command structure with continuous attributes -- making it challenging to learn and optimize in an end-to-end fashion. Concurrently, input images introduce inherent challenges such as photo-metric variability and sensor noise, complicating the reverse engineering process. In this work, we introduce a novel approach that conditionally factorizes the task into two sub-problems. First, we leverage large foundation models, particularly GPT-4V, to predict the global discrete base structure with semantic information. Second, we propose TrAssembler that conditioned on the discrete structure with semantics predicts the continuous attribute values. To support the training of our TrAssembler, we further constructed an annotated CAD dataset of common objects from ShapeNet. Putting all together, our approach and data demonstrate significant first steps towards CAD-ifying images in the wild. Our project page: https://anonymous123342.github.io/

翻译：从图像逆向工程三维计算机辅助设计（CAD）模型是许多下游应用（包括交互式编辑、制造、建筑、机器人等）的重要任务。该任务的难点在于CAD输出与图像输入之间存在巨大的表示差异。CAD模型是精确的程序化构造，涉及将离散命令结构与连续属性相结合的序列操作——这使得以端到端方式学习和优化具有挑战性。同时，输入图像引入了固有的挑战，如光度变化和传感器噪声，进一步复杂化了逆向工程过程。在本工作中，我们提出了一种新颖方法，将任务有条件地分解为两个子问题。首先，我们利用大型基础模型，特别是GPT-4V，来预测具有语义信息的全局离散基础结构。其次，我们提出了TrAssembler，它在具有语义的离散结构条件下预测连续属性值。为了支持TrAssembler的训练，我们进一步从ShapeNet构建了一个常见物体的标注CAD数据集。综合来看，我们的方法和数据展示了向真实世界图像CAD化迈出的重要第一步。项目页面：https://anonymous123342.github.io/

相关内容

CAD

关注 3

《计算机辅助设计》是一份领先的国际期刊，为学术界和工业界提供有关计算机应用于设计的研究和发展的重要论文。计算机辅助设计邀请论文报告新的研究以及新颖或特别重要的应用，在广泛的主题中，跨越所有阶段的设计过程，从概念创造到制造超越。官网地址：http://dblp.uni-trier.de/db/journals/cad/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日