In the realm of audio watermarking, it is challenging to simultaneously encode imperceptible messages while enhancing the message capacity and robustness. Although recent advancements in deep learning-based methods bolster the message capacity and robustness over traditional methods, the encoded messages introduce audible artefacts that restricts their usage in professional settings. In this study, we introduce three key innovations. Firstly, our work is the first deep learning-based model to integrate psychoacoustic model based thresholding to achieve imperceptible watermarks. Secondly, we introduce psuedo-differentiable compression layers, enhancing the robustness of our watermarking algorithm. Lastly, we introduce a method to eliminate the need for perceptual losses, enabling us to achieve SOTA in both robustness as well as imperceptible watermarking. Our contributions lead us to SilentCipher, a model enabling users to encode messages within audio signals sampled at 44.1kHz.
翻译:在音频水印领域,如何在提升信息容量与鲁棒性的同时实现不可感知的信息编码是一项挑战。尽管基于深度学习的最新进展相较于传统方法增强了信息容量与鲁棒性,但编码信息会引入可听伪影,限制了其在专业场景中的应用。本研究提出了三项关键创新。首先,我们的工作是首个基于深度学习的模型,通过整合基于心理声学模型的阈值处理来实现不可感知水印。其次,我们引入了伪可微压缩层,从而增强了水印算法的鲁棒性。最后,我们提出了一种无需感知损失的方法,使得模型在鲁棒性与不可感知水印两方面均达到当前最优水平。这些贡献共同促成了SilentCipher——一个使用户能够在44.1kHz采样率音频信号中编码信息的模型。