We present a cost-effective two-step authentication system that integrates face identification and speaker verification using only a camera and microphone available on common devices. The pipeline first performs face recognition to identify a candidate user from a small enrolled group, then performs voice recognition only against the matched identity to reduce computation and improve robustness. For face recognition, a pruned VGG-16 based classifier is trained on an augmented dataset of 924 images from five subjects, with faces localized by MTCNN; it achieves 95.1% accuracy. For voice recognition, a CNN speaker-verification model trained on LibriSpeech (train-other-360) attains 98.9% accuracy and 3.456% EER on test-clean. Source code and trained models are available at https://github.com/NCUE-EE-AIAL/Two-step-Authentication-Multi-biometric-System.
翻译:本文提出一种经济高效的两步认证系统,该系统仅利用常见设备自带的摄像头与麦克风,集成了人脸识别与说话人验证功能。处理流程首先通过人脸识别从已注册的小规模用户组中确定候选用户身份,随后仅针对匹配身份进行语音识别,从而降低计算开销并提升系统鲁棒性。人脸识别模块采用基于剪枝VGG-16架构的分类器,在包含五名受试者共924张图像的增强数据集上进行训练(人脸检测采用MTCNN),最终达到95.1%的准确率。语音识别模块采用基于LibriSpeech(train-other-360子集)训练的CNN说话人验证模型,在test-clean测试集上取得98.9%准确率与3.456%等错误率。源代码与训练模型已发布于https://github.com/NCUE-EE-AIAL/Two-step-Authentication-Multi-biometric-System。