Spoken language understanding (SLU), one of the key enabling technologies for human-computer interaction in IoT devices, provides an easy-to-use user interface. Human speech can contain a lot of user-sensitive information, such as gender, identity, and sensitive content. New types of security and privacy breaches have thus emerged. Users do not want to expose their personal sensitive information to malicious attacks by untrusted third parties. Thus, the SLU system needs to ensure that a potential malicious attacker cannot deduce the sensitive attributes of the users, while it should avoid greatly compromising the SLU accuracy. To address the above challenge, this paper proposes a novel SLU multi-task privacy-preserving model to prevent both the speech recognition (ASR) and identity recognition (IR) attacks. The model uses the hidden layer separation technique so that SLU information is distributed only in a specific portion of the hidden layer, and the other two types of information are removed to obtain a privacy-secure hidden layer. In order to achieve good balance between efficiency and privacy, we introduce a new mechanism of model pre-training, namely joint adversarial training, to further enhance the user privacy. Experiments over two SLU datasets show that the proposed method can reduce the accuracy of both the ASR and IR attacks close to that of a random guess, while leaving the SLU performance largely unaffected.
翻译:口语语言理解(SLU)作为物联网设备中人机交互的关键使能技术之一,提供了便捷的用户界面。人类语音可能包含大量用户敏感信息,如性别、身份及敏感内容。由此催生了新型安全与隐私泄露风险。用户不愿将个人敏感信息暴露给不可信第三方的恶意攻击。因此,SLU系统需确保潜在恶意攻击者无法推断用户敏感属性,同时尽可能避免大幅降低SLU准确性。针对上述挑战,本文提出一种新型SLU多任务隐私保护模型,用于防御语音识别(ASR)与身份识别(IR)攻击。该模型采用隐层分离技术,使SLU信息仅分布于隐层的特定部分,并移除另外两类信息,从而获得隐私安全的隐层。为实现效率与隐私间的良好平衡,我们引入了一种新的模型预训练机制——联合对抗训练,以进一步增强用户隐私。在两个SLU数据集上的实验表明,所提方法可将ASR与IR攻击的准确率降至接近随机猜测水平,同时使SLU性能基本不受影响。