Lip-based biometric authentication (LBBA) has attracted many researchers during the last decade. The lip is specifically interesting for biometric researchers because it is a twin biometric with the potential to function both as a physiological and a behavioral trait. Although much valuable research was conducted on LBBA, none of them considered the different emotions of the client during the video acquisition step of LBBA, which can potentially affect the client's facial expressions and speech tempo. We proposed a novel network structure called WhisperNetV2, which extends our previously proposed network called WhisperNet. Our proposed network leverages a deep Siamese structure with triplet loss having three identical SlowFast networks as embedding networks. The SlowFast network is an excellent candidate for our task since the fast pathway extracts motion-related features (behavioral lip movements) with a high frame rate and low channel capacity. The slow pathway extracts visual features (physiological lip appearance) with a low frame rate and high channel capacity. Using an open-set protocol, we trained our network using the CREMA-D dataset and acquired an Equal Error Rate (EER) of 0.005 on the test set. Considering that the acquired EER is less than most similar LBBA methods, our method can be considered as a state-of-the-art LBBA method.
翻译:唇部生物特征认证在过去十年间吸引了众多研究者的关注。唇部之所以特别受到生物特征识别研究者的青睐,是因为它是一种双重生物特征,兼具生理特征与行为特征的潜力。尽管已有大量有价值的研究聚焦于唇部生物特征认证,但此前的工作均未考虑用户在认证视频采集过程中可能处于的不同情绪状态,而这种情绪差异可能影响用户的面部表情与言语节奏。本文提出了一种名为WhisperNetV2的新型网络架构,该架构是对我们先前提出的WhisperNet网络的扩展。我们提出的网络采用基于三元组损失的深度孪生结构,其中包含三个结构相同的SlowFast网络作为嵌入网络。SlowFast网络非常适合本任务,其快速路径以高帧率与低通道容量提取运动相关特征(行为层面的唇部动作),而慢速路径则以低帧率与高通道容量提取视觉特征(生理层面的唇部外观)。采用开放集协议,我们使用CREMA-D数据集对网络进行训练,在测试集上获得了0.005的等错误率。鉴于所获得的等错误率低于大多数同类唇部生物特征认证方法,本方法可被视为当前最先进的唇部生物特征认证技术。