Knowledge Distillation (KD) is a promising approach for unsupervised Anomaly Detection (AD). However, the student network's over-generalization often diminishes the crucial representation differences between teacher and student in anomalous regions, leading to detection failures. To addresses this problem, the widely accepted Reverse Distillation (RD) paradigm designs the asymmetry teacher and student, using an encoder as teacher and a decoder as student. Yet, the design of RD does not ensure that the teacher encoder effectively distinguishes between normal and abnormal features or that the student decoder generates anomaly-free features. Additionally, the absence of skip connections results in a loss of fine details during feature reconstruction. To address these issues, we propose RD with Expert, which introduces a novel Expert-Teacher-Student network for simultaneous distillation of both the teacher encoder and student decoder. The added expert network enhances the student's ability to generate normal features and optimizes the teacher's differentiation between normal and abnormal features, reducing missed detections. Additionally, Guided Information Injection is designed to filter and transfer features from teacher to student, improving detail reconstruction and minimizing false positives. Experiments on several benchmarks prove that our method outperforms existing unsupervised AD methods under RD paradigm, fully unlocking RD's potential.
翻译:知识蒸馏(KD)是一种前景广阔的无监督异常检测(AD)方法。然而,学生网络的过度泛化往往会削弱教师网络与学生网络在异常区域的关键表征差异,导致检测失败。为解决此问题,广泛接受的逆向蒸馏(RD)范式设计了非对称的教师-学生结构,使用编码器作为教师、解码器作为学生。然而,RD的设计并不能保证教师编码器能有效区分正常与异常特征,也不能确保学生解码器能生成无异常特征。此外,跳跃连接的缺失导致特征重建过程中细粒度细节的丢失。为解决这些问题,我们提出了带专家网络的逆向蒸馏,该方法引入了一种新颖的专家-教师-学生网络,用于同时对教师编码器和学生解码器进行蒸馏。新增的专家网络增强了学生生成正常特征的能力,并优化了教师对正常与异常特征的区分,从而减少了漏检。此外,我们设计了引导信息注入机制,以筛选并传递从教师到学生的特征,从而改善细节重建并最小化误报。在多个基准数据集上的实验证明,我们的方法在RD范式下优于现有的无监督AD方法,充分释放了RD的潜力。