The field of steganography has experienced a surge of interest due to the recent advancements in AI-powered techniques, particularly in the context of multimodal setups that enable the concealment of signals within signals of a different nature. The primary objectives of all steganographic methods are to achieve perceptual transparency, robustness, and large embedding capacity - which often present conflicting goals that classical methods have struggled to reconcile. This paper extends and enhances an existing image-in-audio deep steganography method by focusing on improving its robustness. The proposed enhancements include modifications to the loss function, utilization of the Short-Time Fourier Transform (STFT), introduction of redundancy in the encoding process for error correction, and buffering of additional information in the pixel subconvolution operation. The results demonstrate that our approach outperforms the existing method in terms of robustness and perceptual transparency.
翻译:隐写术领域因近期人工智能技术的进步而备受关注,尤其在多模态场景下,此类技术能够将信号隐藏于不同性质的信号之中。所有隐写方法的根本目标在于实现感知透明性、鲁棒性及大容量嵌入——这些目标往往相互冲突,传统方法难以兼顾。本文通过聚焦鲁棒性提升,对现有图像至音频深度隐写方法进行了扩展与增强。所提出的改进包括损失函数优化、短时傅里叶变换(STFT)的运用、编码过程中纠错冗余的引入,以及像素子卷积操作中的附加信息缓冲。实验结果表明,我们的方法在鲁棒性和感知透明性方面均优于现有方法。