Recently, there is a growing interest in creating computer-aided design (CAD) models based on user intent, known as controllable CAD generation. Existing work offers limited controllability and needs separate models for different types of control, reducing efficiency and practicality. To achieve controllable generation across all CAD construction hierarchies, such as sketch-extrusion, extrusion, sketch, face, loop and curve, we propose FlexCAD, a unified model by fine-tuning large language models (LLMs). First, to enhance comprehension by LLMs, we represent a CAD model as a structured text by abstracting each hierarchy as a sequence of text tokens. Second, to address various controllable generation tasks in a unified model, we introduce a hierarchy-aware masking strategy. Specifically, during training, we mask a hierarchy-aware field in the CAD text with a mask token. This field, composed of a sequence of tokens, can be set flexibly to represent various hierarchies. Subsequently, we ask LLMs to predict this masked field. During inference, the user intent is converted into a CAD text with a mask token replacing the part the user wants to modify, which is then fed into FlexCAD to generate new CAD models. Comprehensive experiments on public dataset demonstrate the effectiveness of FlexCAD in both generation quality and controllability. Code will be available at https://github.com/microsoft/CADGeneration/FlexCAD.
翻译:近年来,基于用户意图创建计算机辅助设计(CAD)模型(即可控CAD生成)的研究日益受到关注。现有工作提供的可控性有限,且需要针对不同类型的控制分别训练模型,降低了效率与实用性。为实现跨所有CAD构建层次(如草图-拉伸、拉伸、草图、面、环与曲线)的可控生成,我们提出FlexCAD——一种通过微调大型语言模型(LLMs)实现的统一模型。首先,为提升LLMs的理解能力,我们将CAD模型表示为结构化文本,将每个构建层次抽象为一系列文本标记。其次,为在统一模型中处理多种可控生成任务,我们提出一种层次感知掩码策略。具体而言,在训练过程中,我们将CAD文本中一个层次感知字段用掩码标记进行遮盖。该字段由一系列标记组成,可灵活设置为代表不同构建层次。随后,我们要求LLMs预测这个被掩码的字段。在推理阶段,用户意图被转换为包含掩码标记的CAD文本(掩码标记替换用户希望修改的部分),随后输入FlexCAD以生成新的CAD模型。在公开数据集上的综合实验证明了FlexCAD在生成质量与可控性方面的有效性。代码将在 https://github.com/microsoft/CADGeneration/FlexCAD 发布。