Computer-Aided Design is pivotal in modern manufacturing, yet existing automated methods predominantly rely on open-loop, one-shot generation, creating a mismatch with iterative real-world practices. In this paper, we present IterCAD, a unified multimodal agent framework for closed-loop, interactive CAD generation and editing. We formulate the task as a multi-turn interaction between a multimodal agent and an executable CAD sandbox, covering three tasks: Drawing-to-Code, Text-to-Code, and Interactive Editing. To support this, we develop a data synthesis pipeline incorporating advanced industrial manufacturing features to generate standard-compliant multi-view engineering drawings, complex code-editing tasks, and high-fidelity interaction trajectories. We optimize the agent via progressive SFT followed by geometry-aware reinforcement learning with viable-prefix masking to enhance code executability and geometric fidelity. Finally, we introduce the IterCAD-Bench evaluation suite and propose the Chamfer Distance Tolerance-Recall (CD-TR) curve alongside its AUC-TR metric, establishing a survivor-bias-free standard that unifies code validity and geometric precision. Extensive experiments demonstrate that IterCAD achieves highly competitive performance across multiple benchmarks, significantly outperforming existing approaches in both code executability and geometric precision, while exhibiting superior capabilities in closed-loop iterative refinement.
翻译:计算机辅助设计在现代制造业中至关重要,然而现有自动化方法主要依赖开环式的单次生成,与工业实践中迭代式的设计流程存在显著脱节。本文提出IterCAD——一种面向闭环交互式CAD生成与编辑的统一多模态智能体框架。我们将该任务建模为多模态智能体与可执行CAD沙箱之间的多轮交互,涵盖三类任务:图纸转代码、文本转代码和交互式编辑。为此,我们开发了融合先进工业制造特征的数据合成流水线,用于生成符合标准的二维多视图工程图纸、复杂代码编辑任务及高保真交互轨迹。通过渐进式监督微调结合基于几何感知强化学习的技术,并利用可行前缀掩码增强代码可执行性与几何保真度,我们对智能体进行优化。最后,我们提出IterCAD-Bench评估套件,并引入卡姆距离容错-召回曲线及其AUC-TR指标,建立了无生存偏差的标准化评估体系,统一考量代码有效性与几何精度。大量实验表明,IterCAD在多个基准测试中均展现出极具竞争力的性能,在代码可执行性和几何精度上显著优于现有方法,并具备优异的闭环迭代优化能力。