GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design

We provide a dataset for enabling Deep Generative Models (DGMs) in engineering design and propose methods to automate data labeling by utilizing large-scale foundation models. GeoBiked is curated to contain 4 355 bicycle images, annotated with structural and technical features and is used to investigate two automated labeling techniques: The utilization of consolidated latent features (Hyperfeatures) from image-generation models to detect geometric correspondences (e.g. the position of the wheel center) in structural images and the generation of diverse text descriptions for structural images. GPT-4o, a vision-language-model (VLM), is instructed to analyze images and produce diverse descriptions aligned with the system-prompt. By representing technical images as Diffusion-Hyperfeatures, drawing geometric correspondences between them is possible. The detection accuracy of geometric points in unseen samples is improved by presenting multiple annotated source images. GPT-4o has sufficient capabilities to generate accurate descriptions of technical images. Grounding the generation only on images leads to diverse descriptions but causes hallucinations, while grounding it on categorical labels restricts the diversity. Using both as input balances creativity and accuracy. Successfully using Hyperfeatures for geometric correspondence suggests that this approach can be used for general point-detection and annotation tasks in technical images. Labeling such images with text descriptions using VLMs is possible, but dependent on the models detection capabilities, careful prompt-engineering and the selection of input information. Applying foundation models in engineering design is largely unexplored. We aim to bridge this gap with a dataset to explore training, finetuning and conditioning DGMs in this field and suggesting approaches to bootstrap foundation models to process technical images.

翻译：我们提供了一个用于支持深度生成模型在工程设计中应用的数据集，并提出了利用大规模基础模型实现数据自动化标注的方法。GeoBiked数据集包含4,355张自行车图像，标注了结构特征与技术特征，并用于研究两种自动化标注技术：利用图像生成模型中的整合潜在特征（超特征）来检测结构图像中的几何对应关系（例如车轮中心的位置），以及为结构图像生成多样化的文本描述。我们指导视觉语言模型GPT-4o分析图像，并根据系统提示生成多样化的描述。通过将技术图像表示为扩散超特征，可以在图像之间建立几何对应关系。通过提供多张带标注的源图像，可以提高未见样本中几何点的检测精度。GPT-4o具备足够的能力为技术图像生成准确的描述。仅基于图像进行生成会导致描述多样化但产生幻觉，而仅基于类别标签进行生成则会限制多样性。将两者同时作为输入可以在创造性与准确性之间取得平衡。成功使用超特征进行几何对应检测表明，该方法可用于技术图像中的通用点检测与标注任务。使用视觉语言模型为这类图像添加文本描述是可行的，但其效果取决于模型的检测能力、精心的提示工程以及输入信息的选择。在工程设计中应用基础模型的研究尚不充分。我们旨在通过提供一个数据集来弥合这一差距，以探索在该领域中训练、微调和条件化深度生成模型的方法，并提出引导基础模型处理技术图像的途径。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日