Incorporating human perception into training of convolutional neural networks (CNN) has boosted generalization capabilities of such models in open-set recognition tasks. One of the active research questions is where (in the model architecture) and how to efficiently incorporate always-limited human perceptual data into training strategies of models. In this paper, we introduce MENTOR (huMan pErceptioN-guided preTraining fOr increased geneRalization), which addresses this question through two unique rounds of training the CNNs tasked with open-set anomaly detection. First, we train an autoencoder to learn human saliency maps given an input image, without class labels. The autoencoder is thus tasked with discovering domain-specific salient features which mimic human perception. Second, we remove the decoder part, add a classification layer on top of the encoder, and fine-tune this new model conventionally. We show that MENTOR's benefits are twofold: (a) significant accuracy boost in anomaly detection tasks (in this paper demonstrated for detection of unknown iris presentation attacks, synthetically-generated faces, and anomalies in chest X-ray images), compared to models utilizing conventional transfer learning (e.g., sourcing the weights from ImageNet-pretrained models) as well as to models trained with the state-of-the-art approach incorporating human perception guidance into loss functions, and (b) an increase in the efficiency of model training, requiring fewer epochs to converge compared to state-of-the-art training methods.
翻译:将人类感知融入卷积神经网络训练已提升了此类模型在开放集识别任务中的泛化能力。当前的一个活跃研究问题是如何(在模型架构中)以及怎样高效地将始终有限的人类感知数据融入模型训练策略。本文提出MENTOR(人类感知引导预训练以增强泛化能力),通过两轮独特的卷积神经网络训练来解决开放集异常检测任务中的这一问题。首先,我们训练一个自编码器,使其无需类别标签即可根据输入图像学习人类显著性图。该自编码器致力于挖掘模仿人类感知的领域特定显著特征。其次,我们移除解码器部分,在编码器顶部添加分类层,并以传统方式微调该新模型。研究表明MENTOR具有双重优势:(a)与使用传统迁移学习(如从ImageNet预训练模型获取权重)以及采用最先进的人类感知引导损失函数方法训练的模型相比,在异常检测任务中(本文在未知虹膜呈现攻击检测、合成人脸生成及胸部X光图像异常检测中验证)取得了显著的准确性提升;(b)提升了模型训练效率,相较于最先进的训练方法需要更少的训练轮次即可收敛。