Prevailing practice in learning-based audio watermarking is to pursue robustness by expanding the set of simulated distortions during training. However, such surrogates are narrow and prone to overfitting. This paper presents AWARE (Audio Watermarking with Adversarial Resistance to Edits), an alternative approach that avoids reliance on attack-simulation stacks and handcrafted differentiable distortions. Embedding is obtained via adversarial optimization in the time-frequency domain under a level-proportional perceptual budget. Detection employs a time-order-agnostic detector with a Bitwise Readout Head (BRH) that aggregates temporal evidence into one score per watermark bit, enabling reliable watermark decoding even under desynchronization and temporal cuts. Empirically, AWARE attains high audio quality and speech intelligibility (PESQ/STOI) and consistently low BER across various audio edits, often surpassing representative state-of-the-art learning-based audio watermarking systems.
翻译:当前基于学习的音频水印主流实践通过扩展训练期间模拟的失真类型来追求鲁棒性。然而,此类替代方案覆盖范围有限且容易过拟合。本文提出AWARE(具备对抗编辑鲁棒性的音频水印技术),这是一种避免依赖攻击模拟堆栈和手工设计可微分失真的替代方法。嵌入过程通过在时频域中进行对抗优化实现,并遵循与电平成比例的感知预算约束。检测采用时序无关的检测器,配备位元读取头(BRH),该模块将时域证据聚合为每个水印比特的单一分数,从而即使在失同步和时序剪切情况下也能实现可靠的水印解码。实验表明,AWARE在保持高音频质量与语音可懂度(PESQ/STOI)的同时,在各种音频编辑操作下均能实现持续较低的误码率,其性能常优于代表性的先进基于学习的音频水印系统。