The increasing realism of synthetic speech, driven by advancements in text-to-speech models, raises ethical concerns regarding impersonation and disinformation. Audio watermarking offers a promising solution via embedding human-imperceptible watermarks into AI-generated audios. However, the robustness of audio watermarking against common/adversarial perturbations remains understudied. We present AudioMarkBench, the first systematic benchmark for evaluating the robustness of audio watermarking against watermark removal and watermark forgery. AudioMarkBench includes a new dataset created from Common-Voice across languages, biological sexes, and ages, 3 state-of-the-art watermarking methods, and 15 types of perturbations. We benchmark the robustness of these methods against the perturbations in no-box, black-box, and white-box settings. Our findings highlight the vulnerabilities of current watermarking techniques and emphasize the need for more robust and fair audio watermarking solutions. Our dataset and code are publicly available at https://github.com/moyangkuo/AudioMarkBench.
翻译:随着文本到语音模型的进步,合成语音的真实性日益增强,引发了关于身份冒充和虚假信息的伦理担忧。音频水印技术通过将人耳难以察觉的水印嵌入AI生成的音频中,提供了一种有前景的解决方案。然而,音频水印在面对常见/对抗性扰动时的鲁棒性仍未得到充分研究。我们提出了AudioMarkBench,这是首个系统性评估音频水印在抗水印去除和水印伪造方面鲁棒性的基准测试。AudioMarkBench包含一个基于Common-Voice创建的新数据集,涵盖多种语言、生理性别和年龄,集成了3种最先进的水印方法,并定义了15种扰动类型。我们在无盒、黑盒和白盒设置下,对这些方法在各类扰动下的鲁棒性进行了基准测试。我们的研究结果揭示了当前水印技术的脆弱性,并强调了开发更鲁棒、更公平的音频水印解决方案的必要性。我们的数据集和代码已在https://github.com/moyangkuo/AudioMarkBench 公开。