While existing audio watermarking techniques have achieved strong robustness against traditional digital signal processing (DSP) attacks, they remain vulnerable to neural resynthesis. This occurs because modern neural audio codecs act as semantic filters and discard the imperceptible waveform variations used in prior watermarking methods. To address this limitation, we propose Latent-Mark, the first zero-bit audio watermarking framework designed to survive semantic compression. Our key insight is that robustness to the encode-decode process requires embedding the watermark within the codec's invariant latent space. We achieve this by optimizing the audio waveform to induce a detectable directional shift in its encoded latent representation, while constraining perturbations to align with the natural audio manifold to ensure imperceptibility. To prevent overfitting to a single codec's quantization rules, we introduce Cross-Codec Optimization, jointly optimizing the waveform across multiple surrogate codecs to target shared latent invariants. Extensive evaluations demonstrate robust zero-shot transferability to unseen neural codecs, achieving state-of-the-art resilience against traditional DSP attacks while preserving perceptual imperceptibility. Our work inspires future research into universal watermarking frameworks capable of maintaining integrity across increasingly complex and diverse generative distortions.
翻译:尽管现有的音频水印技术在对抗传统数字信号处理攻击方面已展现出强大的鲁棒性,但它们仍然容易受到神经重合成的影响。这是因为现代神经音频编解码器充当了语义过滤器,丢弃了先前水印方法所依赖的不可感知的波形变化。为解决这一局限,我们提出了Latent-Mark,这是首个旨在语义压缩下存活的零比特音频水印框架。我们的核心见解是:要实现对编码-解码过程的鲁棒性,需要将水印嵌入到编解码器的不变潜在空间中。我们通过优化音频波形,使其在编码后的潜在表示中产生可检测的方向性偏移来实现这一点,同时约束扰动以对齐自然音频流形,从而确保不可感知性。为防止对单一编解码器量化规则的过拟合,我们引入了跨编解码器优化,在多个代理编解码器上联合优化波形,以针对共享的潜在不变量。广泛的评估证明了其对未见过的神经编解码器具有鲁棒的零样本可迁移性,在保持感知不可察觉性的同时,实现了对传统DSP攻击的最先进抗性。我们的工作启发了未来对通用水印框架的研究,这些框架能够在日益复杂和多样化的生成式失真中保持完整性。