Replay attacks remain a critical vulnerability for automatic speaker verification systems, particularly in real-time voice assistant applications. In this work, we propose acoustic maps as a novel spatial feature representation for replay speech detection from multi-channel recordings. Derived from classical beamforming over discrete azimuth and elevation grids, acoustic maps encode directional energy distributions that reflect physical differences between human speech radiation and loudspeaker-based replay. A lightweight convolutional neural network is designed to operate on this representation, achieving competitive performance on the ReMASC dataset with approximately 6k trainable parameters. Experimental results show that acoustic maps provide a compact and physically interpretable feature space for replay attack detection across different devices and acoustic environments.
翻译:重放攻击仍然是自动说话人验证系统的一个关键漏洞,尤其是在实时语音助手应用中。本文提出将声学地图作为一种新颖的空间特征表示,用于从多通道录音中检测重放语音。声学地图通过对离散方位角和仰角网格进行经典波束形成而导出,它编码了方向性能量分布,这些分布反映了人类语音辐射与基于扬声器的重放之间的物理差异。我们设计了一个轻量级卷积神经网络来处理这种表示,在ReMASC数据集上以约6k可训练参数实现了具有竞争力的性能。实验结果表明,声学地图为跨不同设备和声学环境的重放攻击检测提供了一个紧凑且物理可解释的特征空间。