In recent years, the remarkable advancements in deep neural networks have brought tremendous convenience. However, the training process of a highly effective model necessitates a substantial quantity of samples, which brings huge potential threats, like unauthorized exploitation with privacy leakage. In response, we propose a framework named HiddenSpeaker, embedding imperceptible perturbations within the training speech samples and rendering them unlearnable for deep-learning-based speaker verification systems that employ large-scale speakers for efficient training. The HiddenSpeaker utilizes a simplified error-minimizing method named Single-Level Error-Minimizing (SLEM) to generate specific and effective perturbations. Additionally, a hybrid objective function is employed for human perceptual optimization, ensuring the perturbation is indistinguishable from human listeners. We conduct extensive experiments on multiple state-of-the-art (SOTA) models in the speaker verification domain to evaluate HiddenSpeaker. Our results demonstrate that HiddenSpeaker not only deceives the model with unlearnable samples but also enhances the imperceptibility of the perturbations, showcasing strong transferability across different models.
翻译:近年来,深度神经网络的显著进展带来了巨大的便利。然而,高效模型的训练过程需要大量样本,这带来了巨大的潜在威胁,例如伴随隐私泄露的未经授权利用。为此,我们提出了一个名为HiddenSpeaker的框架,将不可感知的扰动嵌入训练语音样本中,使其对于采用大规模说话人数据进行高效训练的、基于深度学习的说话人验证系统变得不可学习。HiddenSpeaker利用一种简化的误差最小化方法——单级误差最小化(SLEM)来生成特定且有效的扰动。此外,采用混合目标函数进行人类感知优化,确保扰动对人类听者而言难以区分。我们在说话人验证领域的多个最先进(SOTA)模型上进行了广泛的实验以评估HiddenSpeaker。我们的结果表明,HiddenSpeaker不仅能用不可学习样本欺骗模型,还增强了扰动的不可感知性,并在不同模型间展现出强大的可迁移性。