CableInspect-AD: An Expert-Annotated Anomaly Detection Dataset

Machine learning models are increasingly being deployed in real-world contexts. However, systematic studies on their transferability to specific and critical applications are underrepresented in the research literature. An important example is visual anomaly detection (VAD) for robotic power line inspection. While existing VAD methods perform well in controlled environments, real-world scenarios present diverse and unexpected anomalies that current datasets fail to capture. To address this gap, we introduce $\textit{CableInspect-AD}$, a high-quality, publicly available dataset created and annotated by domain experts from Hydro-Qu\'ebec, a Canadian public utility. This dataset includes high-resolution images with challenging real-world anomalies, covering defects with varying severity levels. To address the challenges of collecting diverse anomalous and nominal examples for setting a detection threshold, we propose an enhancement to the celebrated PatchCore algorithm. This enhancement enables its use in scenarios with limited labeled data. We also present a comprehensive evaluation protocol based on cross-validation to assess models' performances. We evaluate our $\textit{Enhanced-PatchCore}$ for few-shot and many-shot detection, and Vision-Language Models for zero-shot detection. While promising, these models struggle to detect all anomalies, highlighting the dataset's value as a challenging benchmark for the broader research community. Project page: https://mila-iqia.github.io/cableinspect-ad/.

翻译：机器学习模型正越来越多地部署于实际场景中。然而，关于其向特定关键应用领域可迁移性的系统性研究，在现有文献中尚显不足。一个重要的实例是用于机器人电力线巡检的视觉异常检测。尽管现有的视觉异常检测方法在受控环境中表现良好，但现实场景中存在多样且难以预料的异常，而当前的数据集未能充分捕捉这些情况。为填补这一空白，我们引入了$\textit{CableInspect-AD}$，这是一个由加拿大公共事业公司Hydro-Québec的领域专家创建并标注的高质量、公开可用的数据集。该数据集包含具有挑战性的真实世界异常的高分辨率图像，涵盖了不同严重程度的缺陷。针对为设定检测阈值而收集多样异常与正常样本所面临的挑战，我们提出对著名的PatchCore算法进行改进。这一改进使其能够在标注数据有限的场景中使用。我们还提出了一个基于交叉验证的综合评估协议，以评估模型的性能。我们评估了所提出的$\textit{Enhanced-PatchCore}$在少样本与多样本检测中的表现，以及视觉-语言模型在零样本检测中的表现。尽管结果令人鼓舞，但这些模型仍难以检测所有异常，这凸显了该数据集作为更广泛研究社区一个具有挑战性基准的价值。项目页面：https://mila-iqia.github.io/cableinspect-ad/。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日