Due to the data imbalance and the diversity of defects, student-teacher networks (S-T) are favored in unsupervised anomaly detection, which explores the discrepancy in feature representation derived from the knowledge distillation process to recognize anomalies. However, vanilla S-T network is not stable. Employing identical structures to construct the S-T network may weaken the representative discrepancy on anomalies. But using different structures can increase the likelihood of divergent performance on normal data. To address this problem, we propose a novel dual-student knowledge distillation (DSKD) architecture. Different from other S-T networks, we use two student networks a single pre-trained teacher network, where the students have the same scale but inverted structures. This framework can enhance the distillation effect to improve the consistency in recognition of normal data, and simultaneously introduce diversity for anomaly representation. To explore high-dimensional semantic information to capture anomaly clues, we employ two strategies. First, a pyramid matching mode is used to perform knowledge distillation on multi-scale feature maps in the intermediate layers of networks. Second, an interaction is facilitated between the two student networks through a deep feature embedding module, which is inspired by real-world group discussions. In terms of classification, we obtain pixel-wise anomaly segmentation maps by measuring the discrepancy between the output feature maps of the teacher and student networks, from which an anomaly score is computed for sample-wise determination. We evaluate DSKD on three benchmark datasets and probe the effects of internal modules through ablation experiments. The results demonstrate that DSKD can achieve exceptional performance on small models like ResNet18 and effectively improve vanilla S-T networks.
翻译:由于数据不平衡和缺陷多样性,师生网络(S-T)在无监督异常检测中备受青睐,该网络通过挖掘知识蒸馏过程中特征表示的差异来识别异常。然而,原始S-T网络存在不稳定性问题。采用相同结构构建S-T网络可能削弱异常表征的差异性,而使用不同结构则可能增加对正常数据表现不一致的风险。为解决这一问题,我们提出了一种新颖的双学生知识蒸馏(DSKD)架构。与其它S-T网络不同,我们使用两个学生网络和一个预训练教师网络,其中学生网络具有相同规模但结构相反。该框架能增强蒸馏效果,提升对正常数据识别的稳定性,同时为异常表征引入多样性。为探索高维语义信息以捕捉异常线索,我们采用两种策略:首先,通过金字塔匹配模式对网络中间层的多尺度特征图进行知识蒸馏;其次,受真实世界小组讨论启发,通过深度特征嵌入模块促进两个学生网络之间的交互。在分类层面,我们通过测量教师网络与学生网络输出特征图之间的差异,获得像素级异常分割图,并据此计算异常分数以实现样本级判定。我们在三个基准数据集上评估了DSKD,并通过消融实验探究了内部模块的效果。结果表明,DSKD能在ResNet18等小型模型上实现优异性能,并有效改进原始S-T网络。