Incorporating human-perceptual intelligence into model training has shown to increase the generalization capability of models in several difficult biometric tasks, such as presentation attack detection (PAD) and detection of synthetic samples. After the initial collection phase, human visual saliency (e.g., eye-tracking data, or handwritten annotations) can be integrated into model training through attention mechanisms, augmented training samples, or through human perception-related components of loss functions. Despite their successes, a vital, but seemingly neglected, aspect of any saliency-based training is the level of salience granularity (e.g., bounding boxes, single saliency maps, or saliency aggregated from multiple subjects) necessary to find a balance between reaping the full benefits of human saliency and the cost of its collection. In this paper, we explore several different levels of salience granularity and demonstrate that increased generalization capabilities of PAD and synthetic face detection can be achieved by using simple yet effective saliency post-processing techniques across several different CNNs.
翻译:将人类感知智能融入模型训练,已在多项困难生物特征任务(如呈现攻击检测(PAD)与合成样本检测)中展现出提升模型泛化能力的成效。在初始数据采集阶段之后,人类视觉显著性(例如眼动追踪数据或人工标注)可通过注意力机制、增强训练样本或损失函数中的人感知相关组件整合至模型训练流程。尽管已取得显著成功,但基于显著性的训练中一个关键却常被忽视的方面是显著性颗粒度水平(如边界框、单幅显著性图或多被试显著性聚合),这关乎如何在充分获取人类显著性效益与其采集成本之间取得平衡。本文探索了多个不同级别的显著性颗粒度,并通过在多种卷积神经网络(CNN)上应用简单高效的显著性后处理技术,证明可进一步提升PAD与合成人脸检测的泛化能力。