With the rapid advancement of smart glasses, voice interaction has been widely adopted due to its naturalness and convenience. However, its practical deployment is often undermined by vulnerability to spoofing attacks, while no public dataset currently exists for voice liveness detection and authentication in smart-glasses scenarios. To address this challenge, we first collect a multi-acoustic-modal dataset comprising 16-channel audio data from 42 subjects, along with corresponding attack samples covering two attack categories. Based on insights derived from this collected data, we propose AuthG-Live, a sound-field-based voice liveness detection method, and AuthG-Net, a multi-acoustic-modal authentication model. We further benchmark seven voice liveness detection methods and four authentication methods across diverse acoustic modalities. The results demonstrate that our proposed approach achieves state-of-the-art performance on four benchmark tasks, and extensive ablation studies validate the generalizability of our methods across different modality combinations. Finally, we release this dataset, termed AuthGlass, to facilitate future research on voice liveness detection and authentication for smart glasses.
翻译:随着智能眼镜的快速发展,语音交互因其自然性与便捷性而得到广泛应用。然而,其实际部署常因易受欺骗攻击的影响而受到阻碍,且目前尚无针对智能眼镜场景的公开语音活体检测与身份认证数据集。为应对这一挑战,我们首先构建了一个多声学模态数据集,包含来自42位受试者的16通道音频数据,以及覆盖两类攻击的相应攻击样本。基于对所收集数据的深入分析,我们提出了AuthG-Live——一种基于声场的语音活体检测方法,以及AuthG-Net——一种多声学模态身份认证模型。我们进一步在多种声学模态下对七种语音活体检测方法和四种身份认证方法进行了基准测试。结果表明,我们提出的方法在四项基准任务上均达到了最先进的性能,且广泛的消融实验验证了我们的方法在不同模态组合下的泛化能力。最后,我们发布了该数据集(命名为AuthGlass),以促进未来针对智能眼镜的语音活体检测与身份认证研究。