The availability of high-quality, AI-generated audio raises security challenges such as misinformation campaigns and voice-cloning fraud. A key defense against the misuse of AI-generated audio is by watermarking it, so that it can be easily distinguished from genuine audio. Those seeking to misuse AI-generated audio may attempt to remove audio watermarks, so studying effective watermark removal techniques is critical to objectively evaluate the robustness of audio watermarks. Previous watermark removal schemes typically assume access to the target watermark detector during the removal process. This assumption is often impractical, which may lead to a false sense of confidence in current watermark schemes. We introduce HarmonicAttack, a novel audio watermark removal method that requires no access to the target watermark algorithm. It only needs a number of original and watermarked samples to train a general model capable of removing watermarks from audio samples. We also find that training samples do not need to share the same distribution as target samples, as our attack generalizes to out-of-distribution samples with minimal degradation. Compared with existing watermark removal attacks, HarmonicAttack is more effective at removing watermarks from state-of-the-art schemes, including AudioSeal, WavMark, SilentCipher, and AudioMarkNet, while maintaining high perceptual quality. Although HarmonicAttack is trained on the LibriSpeech dataset against AudioSeal, it generalizes across unseen datasets and watermarking schemes. For instance, on VCTK, HarmonicAttack achieves a 92% ASR against AudioMarkNet, substantially outperforming the best baseline at 38%. On FMA, HarmonicAttack reaches 100% ASR against all watermarks, whereas the best baseline achieves only 2% against AudioSeal and 44% against WavMark.
翻译:高质量AI生成音频的普及引发了虚假信息传播、语音克隆诈骗等安全挑战。抵御AI生成音频滥用的关键防御手段是对其添加水印,从而与真实音频清晰区分。试图滥用AI生成音频的群体可能尝试移除音频水印,因此研究有效的水印去除技术对于客观评估音频水印的鲁棒性至关重要。现有水印去除方案通常假设在去除过程中可访问目标水印检测器,这一假设往往难以实现,可能导致对现有水印方案产生虚假信心。我们提出HarmonicAttack,这是一种无需访问目标水印算法的新型音频水印去除方法。该方法仅需若干原始样本和水印样本即可训练通用模型,实现对音频样本中水印的去除。研究还发现,训练样本无需与目标样本保持相同分布,因为本攻击方法可泛化至分布外样本且性能下降极小。与现有水印去除攻击相比,HarmonicAttack在去除包括AudioSeal、WavMark、SilentCipher和AudioMarkNet在内的最先进方案水印时更为有效,同时保持高感知质量。尽管HarmonicAttack基于LibriSpeech数据集针对AudioSeal进行训练,却能泛化至未见数据集和水印方案。例如,在VCTK数据集上,HarmonicAttack对AudioMarkNet的水印去除成功率(ASR)达92%,显著超越最佳基准方案的38%。在FMA数据集上,HarmonicAttack对所有水印方案的ASR均达100%,而最佳基准方案对AudioSeal仅达2%、对WavMark仅达44%。