Sex conversion in speech involves privacy risks from data collection and often leaves residual sex-specific cues in outputs, even when target speaker references are unavailable. We introduce RASO for Reference-free Adversarial Sex Obfuscation. Innovations include a sex-conditional adversarial learning framework to disentangle linguistic content from sex-related acoustic markers and explicit regularisation to align fundamental frequency distributions and formant trajectories with sex-neutral characteristics learned from sex-balanced training data. RASO preserves linguistic content and, even when assessed under a semi-informed attack model, it significantly outperforms a competing approach to sex obfuscation.
翻译:语音性别转换存在数据收集带来的隐私风险,且即使在目标说话人参考不可得的情况下,输出结果中仍常残留性别特异性线索。本文提出RASO(参考无关的对抗性别混淆)方法。创新点包括:采用性别条件对抗学习框架以解耦语言内容与性别相关声学特征,并通过显式正则化使基频分布与共振峰轨迹与从性别平衡训练数据中学得的性别中性特征对齐。RASO能有效保持语言内容完整性,即便在半知情攻击模型下评估,其性能也显著优于现有性别混淆方法。