Automatic Speaker Verification (ASV) systems are increasingly used in voice bio-metrics for user authentication but are susceptible to logical and physical spoofing attacks, posing security risks. Existing research mainly tackles logical or physical attacks separately, leading to a gap in unified spoofing detection. Moreover, when existing systems attempt to handle both types of attacks, they often exhibit significant disparities in the Equal Error Rate (EER). To bridge this gap, we present a Parallel Stacked Aggregation Network that processes raw audio. Our approach employs a split-transform-aggregation technique, dividing utterances into convolved representations, applying transformations, and aggregating the results to identify logical (LA) and physical (PA) spoofing attacks. Evaluation of the ASVspoof-2019 and VSDC datasets shows the effectiveness of the proposed system. It outperforms state-of-the-art solutions, displaying reduced EER disparities and superior performance in detecting spoofing attacks. This highlights the proposed method's generalizability and superiority. In a world increasingly reliant on voice-based security, our unified spoofing detection system provides a robust defense against a spectrum of voice spoofing attacks, safeguarding ASVs and user data effectively.
翻译:自动说话人验证(ASV)系统在语音生物特征用户认证中应用日益广泛,但容易受到逻辑和物理欺骗攻击的威胁,存在安全风险。现有研究主要单独处理逻辑或物理攻击,导致统一欺骗检测存在空白。此外,当现有系统试图同时应对两类攻击时,其等错误率(EER)往往呈现显著差异。为弥合这一鸿沟,我们提出了一种处理原始音频的并行堆叠聚合网络。该方法采用分割-变换-聚合技术,将语音片段分割为卷积表示,进行变换后聚合结果,以识别逻辑(LA)和物理(PA)欺骗攻击。在ASVspoof-2019和VSDC数据集上的评估表明,所提系统具有有效性。它优于现有最优解决方案,展现出更低的EER差异和更优的欺骗攻击检测性能,凸显了所提方法的泛化能力和优越性。在日益依赖语音安全的时代,我们的统一欺骗检测系统能为各类语音欺骗攻击提供稳健防御,有效保障ASV系统和用户数据安全。