利用生成式人工智能扩展数据集以提升光学质量控制中监督式机器学习性能 (Improving Supervised Machine Learning Performance in Optical Quality Control via Generative AI for Dataset Expansion)

Supervised machine learning algorithms play a crucial role in optical quality control within industrial production. These approaches require representative datasets for effective model training. However, while non-defective components are frequent, defective parts are rare in production, resulting in highly imbalanced datasets that adversely impact model performance. Existing strategies to address this challenge, such as specialized loss functions or traditional data augmentation techniques, have limitations, including the need for careful hyperparameter tuning or the alteration of only simple image features. Therefore, this work explores the potential of generative artificial intelligence (GenAI) as an alternative method for expanding limited datasets and enhancing supervised machine learning performance. Specifically, we investigate Stable Diffusion and CycleGAN as image generation models, focusing on the segmentation of combine harvester components in thermal images for subsequent defect detection. Our results demonstrate that dataset expansion using Stable Diffusion yields the most significant improvement, enhancing segmentation performance by 4.6 %, resulting in a Mean Intersection over Union (Mean IoU) of 84.6 %.

翻译：监督式机器学习算法在工业生产的光学质量控制中发挥着关键作用。这些方法需要具有代表性的数据集以实现有效的模型训练。然而，尽管无缺陷部件在生产中较为常见，但有缺陷的部件却十分罕见，这导致数据集高度不平衡，从而对模型性能产生不利影响。现有应对这一挑战的策略，如专用损失函数或传统数据增强技术，均存在局限性，包括需要仔细调整超参数或仅能改变简单的图像特征。因此，本研究探索了生成式人工智能作为一种替代方法，用于扩展有限数据集并提升监督式机器学习性能的潜力。具体而言，我们研究了Stable Diffusion和CycleGAN作为图像生成模型，重点关注热成像中联合收割机部件的分割，以便进行后续的缺陷检测。我们的结果表明，使用Stable Diffusion进行数据集扩展带来了最显著的性能提升，将分割性能提高了4.6%，最终实现了84.6%的平均交并比。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《视觉Transformers自监督学习机制综述》

专知会员服务

29+阅读 · 2024年9月2日

工业机器视觉中的生成式人工智能综述

专知会员服务

51+阅读 · 2024年9月1日

【斯坦福博士论文】通过数据高效方法增强机器学习，188页pdf

专知会员服务

34+阅读 · 2024年3月30日

自监督为何有效？243页普林斯顿博士论文《理解自监督表示学习》，全面阐述对比学习、语言模型和自我预测三类方法

专知会员服务

69+阅读 · 2023年1月29日