利用视觉-语言模型实现CAD设计中的制造特征识别 (Leveraging Vision-Language Models for Manufacturing Feature Recognition in CAD Designs)

Automatic feature recognition (AFR) is essential for transforming design knowledge into actionable manufacturing information. Traditional AFR methods, which rely on predefined geometric rules and large datasets, are often time-consuming and lack generalizability across various manufacturing features. To address these challenges, this study investigates vision-language models (VLMs) for automating the recognition of a wide range of manufacturing features in CAD designs without the need for extensive training datasets or predefined rules. Instead, prompt engineering techniques, such as multi-view query images, few-shot learning, sequential reasoning, and chain-of-thought, are applied to enable recognition. The approach is evaluated on a newly developed CAD dataset containing designs of varying complexity relevant to machining, additive manufacturing, sheet metal forming, molding, and casting. Five VLMs, including three closed-source models (GPT-4o, Claude-3.5-Sonnet, and Claude-3.0-Opus) and two open-source models (LLava and MiniCPM), are evaluated on this dataset with ground truth features labelled by experts. Key metrics include feature quantity accuracy, feature name matching accuracy, hallucination rate, and mean absolute error (MAE). Results show that Claude-3.5-Sonnet achieves the highest feature quantity accuracy (74%) and name-matching accuracy (75%) with the lowest MAE (3.2), while GPT-4o records the lowest hallucination rate (8%). In contrast, open-source models have higher hallucination rates (>30%) and lower accuracies (<40%). This study demonstrates the potential of VLMs to automate feature recognition in CAD designs within diverse manufacturing scenarios.

翻译：自动特征识别（AFR）对于将设计知识转化为可操作的制造信息至关重要。传统的AFR方法依赖于预定义的几何规则和大型数据集，通常耗时且缺乏跨不同制造特征的泛化能力。为应对这些挑战，本研究探索利用视觉-语言模型（VLMs）实现CAD设计中多种制造特征的自动识别，无需大量训练数据集或预定义规则。通过应用提示工程技术，如多视角查询图像、少样本学习、顺序推理和思维链，来实现特征识别。该方法在一个新开发的CAD数据集上进行评估，该数据集包含与机械加工、增材制造、钣金成形、注塑成型和铸造相关的不同复杂度设计。在该数据集上评估了五个VLM模型，包括三个闭源模型（GPT-4o、Claude-3.5-Sonnet和Claude-3.0-Opus）和两个开源模型（LLava和MiniCPM），所有特征真值均由专家标注。关键评估指标包括特征数量准确率、特征名称匹配准确率、幻觉率和平均绝对误差（MAE）。结果显示，Claude-3.5-Sonnet实现了最高的特征数量准确率（74%）和名称匹配准确率（75%）以及最低的MAE（3.2），而GPT-4o的幻觉率最低（8%）。相比之下，开源模型的幻觉率较高（>30%）且准确率较低（<40%）。本研究证明了VLM在多样化制造场景中实现CAD设计特征自动识别的潜力。

相关内容

CAD

关注 3

《计算机辅助设计》是一份领先的国际期刊，为学术界和工业界提供有关计算机应用于设计的研究和发展的重要论文。计算机辅助设计邀请论文报告新的研究以及新颖或特别重要的应用，在广泛的主题中，跨越所有阶段的设计过程，从概念创造到制造超越。官网地址：http://dblp.uni-trier.de/db/journals/cad/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日