This study explores the recently proposed challenging multi-view Anomaly Detection (AD) task. Single-view tasks would encounter blind spots from other perspectives, resulting in inaccuracies in sample-level prediction. Therefore, we introduce the \textbf{M}ulti-\textbf{V}iew \textbf{A}nomaly \textbf{D}etection (\textbf{MVAD}) framework, which learns and integrates features from multi-views. Specifically, we proposed a \textbf{M}ulti-\textbf{V}iew \textbf{A}daptive \textbf{S}election (\textbf{MVAS}) algorithm for feature learning and fusion across multiple views. The feature maps are divided into neighbourhood attention windows to calculate a semantic correlation matrix between single-view windows and all other views, which is a conducted attention mechanism for each single-view window and the top-K most correlated multi-view windows. Adjusting the window sizes and top-K can minimise the computational complexity to linear. Extensive experiments on the Real-IAD dataset for cross-setting (multi/single-class) validate the effectiveness of our approach, achieving state-of-the-art performance among sample \textbf{4.1\%}$\uparrow$/ image \textbf{5.6\%}$\uparrow$/pixel \textbf{6.7\%}$\uparrow$ levels with a total of ten metrics with only \textbf{18M} parameters and fewer GPU memory and training time.
翻译:本研究探讨了近期提出的具有挑战性的多视角异常检测任务。单视角任务会遭遇来自其他视角的盲区,导致样本级预测不准确。为此,我们引入了**多视角异常检测**框架,该框架能够学习并整合来自多视角的特征。具体而言,我们提出了一种**多视角自适应选择**算法,用于跨多视角的特征学习与融合。特征图被划分为邻域注意力窗口,以计算单视角窗口与所有其他视角之间的语义相关性矩阵,这是一种针对每个单视角窗口与最相关的K个多视角窗口执行的注意力机制。通过调整窗口大小和K值,可将计算复杂度降至线性。在Real-IAD数据集上进行的跨设置(多类/单类)大量实验验证了我们方法的有效性,在样本**4.1%**↑/图像**5.6%**↑/像素**6.7%**↑三个级别共十项指标上取得了最先进的性能,且仅需**1800万**参数以及更少的GPU内存和训练时间。