Privacy-preserving voice protection approaches primarily suppress privacy-related information derived from paralinguistic attributes while preserving the linguistic content. Existing solutions focus particularly on single-speaker scenarios. However, they lack practicality for real-world applications, i.e., multi-speaker scenarios. In this paper, we present an initial attempt to provide a multi-speaker anonymization benchmark by defining the task and evaluation protocol, proposing benchmarking solutions, and discussing the privacy leakage of overlapping conversations. The proposed benchmark solutions are based on a cascaded system that integrates spectral-clustering-based speaker diarization and disentanglement-based speaker anonymization using a selection-based anonymizer. To improve utility, the benchmark solutions are further enhanced by two conversation-level speaker vector anonymization methods. The first method minimizes the differential similarity across speaker pairs in the original and anonymized conversations, which maintains original speaker relationships in the anonymized version. The other minimizes the aggregated similarity across anonymized speakers, which achieves better differentiation between speakers.Experiments conducted on both non-overlap simulated and real-world datasets demonstrate the effectiveness of the multi-speaker anonymization system with the proposed speaker anonymizers. Additionally, we analyzed overlapping speech regarding privacy leakage and provided potential solutions
翻译:隐私保护语音处理方法主要抑制源自副语言属性的隐私相关信息,同时保留语言内容。现有解决方案特别侧重于单说话人场景,然而这些方案在实际应用场景(即多说话人场景)中缺乏实用性。本文首次尝试通过定义任务与评估协议、提出基准解决方案、并讨论重叠对话中的隐私泄露问题,构建多说话人匿名化基准。所提出的基准解决方案基于级联系统,该系统整合了基于谱聚类的说话人日志和基于解耦的说话人匿名化技术,并采用选择式匿名器。为提升实用性,基准解决方案进一步通过两种对话级说话人向量匿名化方法进行增强:第一种方法最小化原始对话与匿名化对话中说话人对之间的差异相似度,从而在匿名化版本中保持原始说话人关系;另一种方法最小化匿名化说话人间的聚合相似度,以实现说话人之间更好的区分性。在非重叠模拟数据集和真实数据集上进行的实验证明了所提出的说话人匿名化器在多说话人匿名化系统中的有效性。此外,我们分析了重叠语音在隐私泄露方面的问题,并提供了潜在解决方案。