In recent years, the remarkable advancements in deep neural networks have brought tremendous convenience. However, the training process of a highly effective model necessitates a substantial quantity of samples, which brings huge potential threats, like unauthorized exploitation with privacy leakage. In response, we propose a framework named HiddenSpeaker, embedding imperceptible perturbations within the training speech samples and rendering them unlearnable for deep-learning-based speaker verification systems that employ large-scale speakers for efficient training. The HiddenSpeaker utilizes a simplified error-minimizing method named Single-Level Error-Minimizing (SLEM) to generate specific and effective perturbations. Additionally, a hybrid objective function is employed for human perceptual optimization, ensuring the perturbation is indistinguishable from human listeners. We conduct extensive experiments on multiple state-of-the-art (SOTA) models in the speaker verification domain to evaluate HiddenSpeaker. Our results demonstrate that HiddenSpeaker not only deceives the model with unlearnable samples but also enhances the imperceptibility of the perturbations, showcasing strong transferability across different models.
翻译:近年来,深度神经网络的显著进展带来了巨大便利。然而,高效模型的训练过程需要大量样本,这带来了巨大的潜在威胁,例如伴随隐私泄露的未经授权利用。为此,我们提出名为HiddenSpeaker的框架,将不可感知的扰动嵌入训练语音样本中,使其对采用大规模说话人数据进行高效训练的基于深度学习的说话人验证系统变得不可学习。HiddenSpeaker利用一种简化的误差最小化方法——单级误差最小化(SLEM)来生成特定且有效的扰动。此外,通过混合目标函数进行人类感知优化,确保扰动对人类听者而言无法区分。我们在说话人验证领域的多个最先进模型上进行了广泛实验以评估HiddenSpeaker。结果表明,HiddenSpeaker不仅能用不可学习样本欺骗模型,还增强了扰动的不可感知性,并在不同模型间展现出强大的可迁移性。