Prevailing practice in learning-based audio watermarking is to pursue robustness by expanding the set of simulated distortions during training. However, such surrogates are narrow and prone to overfitting. This paper presents AWARE (Audio Watermarking with Adversarial Resistance to Edits), an alternative approach that avoids reliance on attack-simulation stacks and handcrafted differentiable distortions. Embedding is obtained through adversarial optimization in the time-frequency domain under a level-proportional perceptual budget. Detection employs a time-order-agnostic detector with a Bitwise Readout Head (BRH) that aggregates temporal evidence into one score per watermark bit, enabling reliable watermark decoding even under desynchronization and temporal cuts. Empirically, AWARE attains high audio quality and speech intelligibility (PESQ/STOI) and consistently low BER across various audio edits, often surpassing representative state-of-the-art learning-based systems.
翻译:基于学习的音频水印技术主流实践是通过在训练阶段扩展模拟失真集来追求鲁棒性。然而,此类替代性失真集范围狭窄且易过拟合。本文提出AWARE(面向对抗性编辑鲁棒性的音频水印技术)这一替代方案,规避了对攻击模拟堆栈与手工可微失真方式的依赖。该方法通过时频域内层级比例感知预算约束下的对抗优化实现水印嵌入,并采用时序顺序无关检测器配合按位读取头(BRH),将时序证据聚合成每个水印比特的置信分数,从而在去同步和时序裁剪条件下仍能可靠解码水印。实验表明,AWARE在保持高音频质量和语音可懂度(PESQ/STOI)的同时,对多种音频编辑场景均实现了持续低误码率,在诸多指标上超越代表性最优学习型系统。