SLMIA-SR: Speaker-Level Membership Inference Attacks against Speaker Recognition Systems

Membership inference attacks allow adversaries to determine whether a particular example was contained in the model's training dataset. While previous works have confirmed the feasibility of such attacks in various applications, none has focused on speaker recognition (SR), a promising voice-based biometric recognition technique. In this work, we propose SLMIA-SR, the first membership inference attack tailored to SR. In contrast to conventional example-level attack, our attack features speaker-level membership inference, i.e., determining if any voices of a given speaker, either the same as or different from the given inference voices, have been involved in the training of a model. It is particularly useful and practical since the training and inference voices are usually distinct, and it is also meaningful considering the open-set nature of SR, namely, the recognition speakers were often not present in the training data. We utilize intra-closeness and inter-farness, two training objectives of SR, to characterize the differences between training and non-training speakers and quantify them with two groups of features driven by carefully-established feature engineering to mount the attack. To improve the generalizability of our attack, we propose a novel mixing ratio training strategy to train attack models. To enhance the attack performance, we introduce voice chunk splitting to cope with the limited number of inference voices and propose to train attack models dependent on the number of inference voices. Our attack is versatile and can work in both white-box and black-box scenarios. Additionally, we propose two novel techniques to reduce the number of black-box queries while maintaining the attack performance. Extensive experiments demonstrate the effectiveness of SLMIA-SR.

翻译：成员推断攻击使攻击者能够判断某个特定样本是否包含在模型的训练数据集中。尽管先前的研究已在多种应用中证实了此类攻击的可行性，但尚无工作聚焦于说话人识别（SR）——一种具有前景的基于语音的生物识别技术。本文提出SLMIA-SR，这是首次针对SR定制的成员推断攻击。与传统的样本级攻击不同，我们的攻击具有说话人级成员推断的特点，即判断给定说话人的任意语音（与给定的推断语音相同或不同）是否参与过模型训练。由于训练语音与推断语音通常不同，此攻击特别实用且可行，同时考虑到SR的开放集特性（即识别说话人往往不在训练数据中），该攻击也具有重要意义。我们利用SR的两个训练目标——类内紧致性和类间疏远性——来刻画训练说话人与非训练说话人之间的差异，并通过精心建立的特征工程驱动的两组特征进行量化，从而实施攻击。为提高攻击的泛化性，我们提出了一种新颖的混合比训练策略来训练攻击模型。为增强攻击性能，我们引入语音片段分割以应对推断语音数量有限的问题，并提出根据推断语音数量训练依赖于其数量的攻击模型。我们的攻击具有通用性，可在白盒和黑盒场景下工作。此外，我们提出了两种新颖技术来减少黑盒查询次数，同时保持攻击性能。大量实验证明了SLMIA-SR的有效性。