In this paper, we present CAD2Program, a new method for reconstructing 3D parametric models from 2D CAD drawings. Our proposed method is inspired by recent successes in vision-language models (VLMs), and departs from traditional methods which rely on task-specific data representations and/or algorithms. Specifically, on the input side, we simply treat the 2D CAD drawing as a raster image, regardless of its original format, and encode the image with a standard ViT model. We show that such an encoding scheme achieves competitive performance against existing methods that operate on vector-graphics inputs, while imposing substantially fewer restrictions on the 2D drawings. On the output side, our method auto-regressively predicts a general-purpose language describing 3D parametric models in text form. Compared to other sequence modeling methods for CAD which use domain-specific sequence representations with fixed-size slots, our text-based representation is more flexible, and can be easily extended to arbitrary geometric entities and semantic or functional properties. Experimental results on a large-scale dataset of cabinet models demonstrate the effectiveness of our method.
翻译:本文提出CAD2Program,一种从二维CAD图纸重建三维参数化模型的新方法。该方法受近期视觉-语言模型(VLMs)成功的启发,有别于依赖特定任务数据表示和/或算法的传统方法。具体而言,在输入侧,我们仅将二维CAD图纸视为栅格图像(无论其原始格式如何),并使用标准ViT模型进行编码。实验表明,这种编码方案相比现有基于矢量图形输入的方法具有相当的性能,同时对二维图纸的限制显著减少。在输出侧,我们的方法以自回归方式预测描述三维参数化模型的通用文本语言。与使用具有固定槽位的领域特定序列表示的CAD序列建模方法相比,我们的基于文本的表示更为灵活,可轻松扩展至任意几何实体及语义或功能属性。在大型橱柜模型数据集上的实验结果验证了该方法的有效性。