With the rapid advancement of smart glasses, voice interaction has been widely adopted due to its naturalness and convenience. However, its practical deployment is often undermined by vulnerability to spoofing attacks, while no public dataset currently exists for voice liveness detection and authentication in smart-glasses scenarios. To address this challenge, we first collect a multi-acoustic-modal dataset comprising 16-channel audio data from 42 subjects, along with corresponding attack samples covering two attack categories. Based on insights derived from this collected data, we propose AuthG-Live, a sound-field-based voice liveness detection method, and AuthG-Net, a multi-acoustic-modal authentication model. We further benchmark seven voice liveness detection methods and four authentication methods across diverse acoustic modalities. The results demonstrate that our proposed approach achieves state-of-the-art performance on four benchmark tasks, and extensive ablation studies validate the generalizability of our methods \red{under real-world constraints}. Finally, we release this dataset, termed AuthGlass, to facilitate future research on voice liveness detection and authentication for smart glasses.
翻译:随着智能眼镜的快速发展,语音交互因其自然性和便捷性而被广泛采用。然而,其实际部署常因易受欺骗攻击而受到影响,而目前尚不存在针对智能眼镜场景下语音活体检测与认证的公开数据集。为解决这一挑战,我们首先收集了一个多声学模态数据集,包含42名受试者的16通道音频数据,以及涵盖两类攻击的对应攻击样本。基于对该数据集的分析,我们提出了基于声场的语音活体检测方法AuthG-Live,以及多声学模态认证模型AuthG-Net。我们进一步在多种声学模态上对七种语音活体检测方法和四种认证方法进行了基准测试。结果表明,我们所提出的方法在四项基准任务上达到了最优性能,且大量的消融实验验证了该方法在真实约束条件下的泛化能力。最后,我们公开了该数据集(命名为AuthGlass),以促进智能眼镜语音活体检测与认证的未来研究。