Replay attacks remain a critical vulnerability for automatic speaker verification systems, particularly in real-time voice assistant applications. In this work, we propose acoustic maps as a novel spatial feature representation for replay speech detection from multi-channel recordings. Derived from classical beamforming over discrete azimuth and elevation grids, acoustic maps encode directional energy distributions that reflect physical differences between human speech radiation and loudspeaker-based replay. A lightweight convolutional neural network is designed to operate on this representation, achieving competitive performance on the ReMASC dataset with approximately 6k trainable parameters. Experimental results show that acoustic maps provide a compact and physically interpretable feature space for replay attack detection across different devices and acoustic environments.
翻译:重放攻击仍然是自动说话人验证系统的一个关键漏洞,特别是在实时语音助手应用中。本文提出将声学图谱作为一种新型空间特征表示,用于多通道录音的重放语音检测。声学图谱来源于在离散方位角和仰角网格上的经典波束形成,编码了反映人类语音辐射与扬声器重放之间物理差异的方向性能量分布。我们设计了一个轻量级卷积神经网络来操作此表示,在ReMASC数据集上以约6k可训练参数实现了竞争性能。实验结果表明,声学图谱为跨不同设备和声学环境的重放攻击检测提供了紧凑且具有物理可解释性的特征空间。