From Concept to Manufacturing: Evaluating Vision-Language Models for Engineering Design

Engineering design is undergoing a transformative shift with the advent of AI, marking a new era in how we approach product, system, and service planning. Large language models have demonstrated impressive capabilities in enabling this shift. Yet, with text as their only input modality, they cannot leverage the large body of visual artifacts that engineers have used for centuries and are accustomed to. This gap is addressed with the release of multimodal vision-language models (VLMs), such as GPT-4V, enabling AI to impact many more types of tasks. Our work presents a comprehensive evaluation of VLMs across a spectrum of engineering design tasks, categorized into four main areas: Conceptual Design, System-Level and Detailed Design, Manufacturing and Inspection, and Engineering Education Tasks. Specifically in this paper, we assess the capabilities of two VLMs, GPT-4V and LLaVA 1.6 34B, in design tasks such as sketch similarity analysis, CAD generation, topology optimization, manufacturability assessment, and engineering textbook problems. Through this structured evaluation, we not only explore VLMs' proficiency in handling complex design challenges but also identify their limitations in complex engineering design applications. Our research establishes a foundation for future assessments of vision language models. It also contributes a set of benchmark testing datasets, with more than 1000 queries, for ongoing advancements and applications in this field.

翻译：随着人工智能的出现，工程设计正在经历一场变革性转变，标志着我们在产品、系统和服务的规划方式上进入了一个新时代。大型语言模型在推动这一转变方面展现了令人印象深刻的能力。然而，由于其仅以文本作为输入模态，它们无法利用工程师们几个世纪以来使用并习惯的大量视觉资料。随着多模态视觉语言模型（如GPT-4V）的发布，这一差距得以弥合，使得人工智能能够影响更多类型的任务。我们的工作对视觉语言模型在一系列工程设计任务中进行了全面评估，这些任务分为四个主要领域：概念设计、系统级与详细设计、制造与检测以及工程教育任务。具体而言，本文评估了GPT-4V和LLaVA 1.6 34B两种视觉语言模型在草图相似性分析、CAD生成、拓扑优化、可制造性评估和工程教科书问题等设计任务中的能力。通过这种结构化评估，我们不仅探索了视觉语言模型在处理复杂设计挑战方面的熟练程度，还识别了它们在复杂工程设计应用中的局限性。我们的研究为未来评估视觉语言模型奠定了基础，并为该领域的持续进步和应用贡献了一套包含1000多个查询的基准测试数据集。

相关内容

Engineering

关注 6

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日