Privacy preservation has long been a concern in smart acoustic monitoring systems, where speech can be passively recorded along with a target signal in the system's operating environment. In this study, we propose the integration of two commonly used approaches in privacy preservation: source separation and adversarial representation learning. The proposed system learns the latent representation of audio recordings such that it prevents differentiating between speech and non-speech recordings. Initially, the source separation network filters out some of the privacy-sensitive data, and during the adversarial learning process, the system will learn privacy-preserving representation on the filtered signal. We demonstrate the effectiveness of our proposed method by comparing our method against systems without source separation, without adversarial learning, and without both. Overall, our results suggest that the proposed system can significantly improve speech privacy preservation compared to that of using source separation or adversarial learning solely while maintaining good performance in the acoustic monitoring task.
翻译:隐私保护一直是智能声学监测系统中长期关注的焦点,在该系统的运行环境中,语音可能随着目标信号被被动记录。在本研究中,我们提出将两种常用的隐私保护方法——源分离与对抗表示学习——进行结合。所提出的系统能够学习音频记录的潜在表示,从而阻止区分语音录音与非语音录音。首先,源分离网络过滤掉部分隐私敏感数据,随后在对抗学习过程中,系统将对过滤后的信号学习具有隐私保护功能的表示。我们通过与缺少源分离、缺少对抗学习以及两者均缺失的系统进行对比,证明了所提出方法的有效性。总体而言,我们的结果表明,与单独使用源分离或对抗学习相比,所提出的系统能够显著提升语音隐私保护性能,同时保持声学监测任务的良好表现。