BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD

Industrial Computer-Aided Design (CAD) code generation requires models to produce executable parametric programs from visual or textual inputs. Beyond recognizing the outer shape of a part, this task involves understanding its 3D structure, inferring engineering parameters, and choosing CAD operations that reflect how the part would be designed and manufactured. Despite the promise of Multimodal large language models (MLLMs) for this task, they are rarely evaluated on whether these capabilities jointly hold in realistic industrial CAD settings. We present BenchCAD, a unified benchmark for industrial CAD reasoning. BenchCAD contains 17,900 execution-verified CadQuery programs across 106 industrial part families, including bevel gears, compression springs, twist drills, and other reusable engineering designs. It evaluates models through visual question answering, code question answering, image-to-code generation, and instruction-guided code editing, enabling fine-grained analysis across perception, parametric abstraction, and executable program synthesis. Across 10+ frontier models, BenchCAD shows that current systems often recover coarse outer geometry but fail to produce faithful parametric CAD programs. Common failures include missing fine 3D structure, misinterpreting industrial design parameters, and replacing essential operations such as sweeps, lofts, and twist-extrudes with simpler sketch-and-extrude patterns. Fine-tuning and reinforcement learning improve in-distribution performance, but generalization to unseen part families remains limited. These results position BenchCAD as a benchmark for measuring and improving the industrial readiness of multimodal CAD automation.

翻译：工业计算机辅助设计（CAD）代码生成要求模型能够从视觉或文本输入中生成可执行的参数化程序。该任务不仅需要识别零件的外部形状，还需理解其三维结构、推断工程参数，并选择反映零件设计与制造方式的CAD操作。尽管多模态大语言模型（MLLMs）在该任务中展现出潜力，但鲜有研究评估这些能力在真实工业CAD场景中的联合表现。我们提出BenchCAD——一个面向工业CAD推理的统一基准。BenchCAD包含覆盖106个工业零件族的17900个经执行验证的CadQuery程序，涵盖锥齿轮、压缩弹簧、麻花钻及其他可复用工程设计。它通过视觉问答、代码问答、图像到代码生成以及指令引导代码编辑来评估模型，从而实现对感知、参数化抽象与可执行程序合成能力的细粒度分析。在10余个前沿模型上的评测表明，当前系统常能恢复粗略的外部几何形状，但无法生成可靠的参数化CAD程序。常见失败模式包括：遗漏精细三维结构、错误解释工业设计参数、以及用简单的草图拉伸模式替代扫掠、放样和扭转拉伸等关键操作。微调和强化学习能提升分布内性能，但对未见零件族的泛化能力仍然有限。这些结果确立了BenchCAD作为衡量和提升多模态CAD自动化工业就绪度的基准地位。

相关内容

CAD

关注 3

《计算机辅助设计》是一份领先的国际期刊，为学术界和工业界提供有关计算机应用于设计的研究和发展的重要论文。计算机辅助设计邀请论文报告新的研究以及新颖或特别重要的应用，在广泛的主题中，跨越所有阶段的设计过程，从概念创造到制造超越。官网地址：http://dblp.uni-trier.de/db/journals/cad/

可靠且负责任的基础模型：全面综述

专知会员服务

20+阅读 · 2月10日

《Med3DVLM：面向三维医学图像分析的高效视觉-语言模型》

专知会员服务

9+阅读 · 2025年3月27日

【NeurIPS2024】Text2CAD：从初学者到专家级文本提示生成连续CAD模型

专知会员服务

21+阅读 · 2024年9月26日

国家标准《人工智能预训练模型第2 部分：评测指标与方法》

专知会员服务

93+阅读 · 2024年6月15日