In this work, we introduce CAD-Coder, a novel framework that reformulates text-to-CAD as the generation of CadQuery scripts - a Python-based, parametric CAD language. This representation enables direct geometric validation, a richer modeling vocabulary, and seamless integration with existing LLMs. To further enhance code validity and geometric fidelity, we propose a two-stage learning pipeline: (1) supervised fine-tuning on paired text-CadQuery data, and (2) reinforcement learning with Group Reward Policy Optimization (GRPO), guided by a CAD-specific reward comprising both a geometric reward (Chamfer Distance) and a format reward. We also introduce a chain-of-thought (CoT) planning process to improve model reasoning, and construct a large-scale, high-quality dataset of 110K text-CadQuery-3D model triplets and 1.5K CoT samples via an automated pipeline. Extensive experiments demonstrate that CAD-Coder enables LLMs to generate diverse, valid, and complex CAD models directly from natural language, advancing the state of the art of text-to-CAD generation and geometric reasoning.
翻译:本文提出CAD-Coder,一种将文本到CAD生成重新定义为CadQuery脚本(一种基于Python的参数化CAD语言)生成的新框架。该表示方法支持直接几何验证、提供更丰富的建模词汇,并能与现有大语言模型无缝集成。为提升代码有效性与几何保真度,我们设计了两阶段学习流程:(1)在成对的文本-CadQuery数据上进行监督微调;(2)采用分组奖励策略优化(GRPO)进行强化学习,其奖励函数包含几何奖励(倒角距离)与格式奖励。我们还引入了思维链规划过程以增强模型推理能力,并通过自动化流程构建了包含11万组文本-CadQuery-三维模型三元组及1500个思维链样本的大规模高质量数据集。大量实验表明,CAD-Coder使大语言模型能够直接从自然语言生成多样、有效且复杂的CAD模型,推动了文本到CAD生成与几何推理领域的技术前沿。