OCR is All you need: Importing Multi-Modality into Image-based Defect Detection System

Automatic optical inspection (AOI) plays a pivotal role in the manufacturing process, predominantly leveraging high-resolution imaging instruments for scanning purposes. It detects anomalies by analyzing image textures or patterns, making it an essential tool in industrial manufacturing and quality control. Despite its importance, the deployment of models for AOI often faces challenges. These include limited sample sizes, which hinder effective feature learning, variations among source domains, and sensitivities to changes in lighting and camera positions during imaging. These factors collectively compromise the accuracy of model predictions. Traditional AOI often fails to capitalize on the rich mechanism-parameter information from machines or inside images, including statistical parameters, which typically benefit AOI classification. To address this, we introduce an external modality-guided data mining framework, primarily rooted in optical character recognition (OCR), to extract statistical features from images as a second modality to enhance performance, termed OANet (Ocr-Aoi-Net). A key aspect of our approach is the alignment of external modality features, extracted using a single modality-aware model, with image features encoded by a convolutional neural network. This synergy enables a more refined fusion of semantic representations from different modalities. We further introduce feature refinement and a gating function in our OANet to optimize the combination of these features, enhancing inference and decision-making capabilities. Experimental outcomes show that our methodology considerably boosts the recall rate of the defect detection model and maintains high robustness even in challenging scenarios.

翻译：自动光学检测（AOI）在制造过程中发挥着关键作用，主要利用高分辨率成像仪器进行扫描。它通过分析图像纹理或模式来检测异常，因此成为工业制造和质量控制中的关键工具。尽管其重要性不言而喻，但用于AOI的模型部署常常面临挑战，包括样本数量有限（阻碍有效的特征学习）、源域之间的差异，以及对成像过程中光照变化和相机位置变化的敏感性。这些因素共同影响了模型预测的准确性。传统的AOI往往未能充分利用来自机器或图像内部的丰富机制-参数信息（包括统计参数），而这些信息通常有助于AOI的分类。为解决这一问题，我们引入了一种外部模态引导的数据挖掘框架，该框架主要基于光学字符识别（OCR），从图像中提取统计特征作为第二模态以提升性能，称为OANet（Ocr-Aoi-Net）。我们方法的一个关键方面是，将使用单模态感知模型提取的外部模态特征与卷积神经网络编码的图像特征进行对齐。这种协同作用实现了不同模态语义表示的更精细融合。我们进一步在OANet中引入了特征精炼和门控函数，以优化这些特征的组合，增强推理和决策能力。实验结果表明，我们的方法显著提升了缺陷检测模型的召回率，即使在具有挑战性的场景中也保持了较高的鲁棒性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日