We propose CAD-Assistant, a general-purpose CAD agent for AI-assisted design. Our approach is based on a powerful Vision and Large Language Model (VLLM) as a planner and a tool-augmentation paradigm using CAD-specific modules. CAD-Assistant addresses multimodal user queries by generating actions that are iteratively executed on a Python interpreter equipped with the FreeCAD software, accessed via its Python API. Our framework is able to assess the impact of generated CAD commands on geometry and adapts subsequent actions based on the evolving state of the CAD design. We consider a wide range of CAD-specific tools including Python libraries, modules of the FreeCAD Python API, helpful routines, rendering functions and other specialized modules. We evaluate our method on multiple CAD benchmarks and qualitatively demonstrate the potential of tool-augmented VLLMs as generic CAD task solvers across diverse CAD workflows.
翻译:我们提出CAD-Assistant,一种用于AI辅助设计的通用CAD智能体。该方法基于强大的视觉大语言模型作为规划器,并采用融合CAD专用模块的工具增强范式。CAD-Assistant通过生成可在配备FreeCAD软件的Python解释器上迭代执行的动作(通过其Python API访问)来处理多模态用户查询。我们的框架能够评估生成的CAD命令对几何结构的影响,并根据CAD设计状态的动态演变调整后续动作。我们整合了广泛的CAD专用工具,包括Python库、FreeCAD Python API模块、辅助例程、渲染函数及其他专用模块。我们在多个CAD基准测试上评估了该方法,并通过定性实验展示了工具增强的视觉大语言模型在不同CAD工作流中作为通用任务求解器的潜力。