MVTec AD 2 数据集：面向无监督异常检测的高级场景 (The MVTec AD 2 Dataset: Advanced Scenarios for Unsupervised Anomaly Detection)

In recent years, performance on existing anomaly detection benchmarks like MVTec AD and VisA has started to saturate in terms of segmentation AU-PRO, with state-of-the-art models often competing in the range of less than one percentage point. This lack of discriminatory power prevents a meaningful comparison of models and thus hinders progress of the field, especially when considering the inherent stochastic nature of machine learning results. We present MVTec AD 2, a collection of eight anomaly detection scenarios with more than 8000 high-resolution images. It comprises challenging and highly relevant industrial inspection use cases that have not been considered in previous datasets, including transparent and overlapping objects, dark-field and back light illumination, objects with high variance in the normal data, and extremely small defects. We provide comprehensive evaluations of state-of-the-art methods and show that their performance remains below 60% average AU-PRO. Additionally, our dataset provides test scenarios with lighting condition changes to assess the robustness of methods under real-world distribution shifts. We host a publicly accessible evaluation server that holds the pixel-precise ground truth of the test set (https://benchmark.mvtec.com/). All image data is available at https://www.mvtec.com/company/research/datasets/mvtec-ad-2.

翻译：近年来，在现有异常检测基准（如 MVTec AD 和 VisA）上，模型在分割AU-PRO指标上的性能已开始趋于饱和，最先进的模型通常仅在不到一个百分点的范围内竞争。这种区分能力的缺乏阻碍了对模型进行有意义的比较，从而制约了该领域的发展，尤其是在考虑到机器学习结果固有的随机性时。我们提出了 MVTec AD 2，这是一个包含八个异常检测场景、超过8000张高分辨率图像的数据集。它涵盖了先前数据集中未考虑的、具有挑战性且高度相关的工业检测用例，包括透明与重叠物体、暗场与背光照明、正常数据方差极高的物体以及极其微小的缺陷。我们对最先进的方法进行了全面评估，结果表明其平均AU-PRO性能仍低于60%。此外，我们的数据集提供了光照条件变化的测试场景，以评估方法在真实世界分布偏移下的鲁棒性。我们托管了一个公开可访问的评估服务器，其中包含测试集的像素级标注真值（https://benchmark.mvtec.com/）。所有图像数据可在 https://www.mvtec.com/company/research/datasets/mvtec-ad-2 获取。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日