FGAS: Fixed Decoder Network-Based Audio Steganography with Adversarial Perturbation Generation

The rapid development of Artificial Intelligence Generated Content (AIGC) has made high-fidelity generated audio widely available across the Internet, driving the advancement of audio steganography. Benefiting from advances in deep learning, current audio steganography schemes are mainly based on encoder-decoder network architectures. While these methods guarantee a certain level of perceptual quality for stego audio, they typically face high computational cost and long implementation time, as well as poor anti-steganalysis performance. To address the aforementioned issues, we pioneer a Fixed Decoder Network-Based Audio Steganography with Adversarial Perturbation Generation (FGAS). Adversarial perturbations carrying a secret message are embedded into the cover audio to generate stego audio. The receiver only needs to share the structure and key of the fixed decoder network to accurately extract the secret message from the stego audio. In FGAS, we propose an Audio Adversarial Perturbation Generation (A2PG) strategy with an optional robust extension and design a lightweight fixed decoder. The fixed decoder guarantees reliable extraction of the hidden message, while adversarial perturbations are optimized to keep the stego audio perceptually and statistically close to the cover audio, thereby improving anti-steganalysis performance. The experimental results show that FGAS significantly improves stego audio quality, achieving an average PSNR gain of over 10 dB compared to SOTA methods. Furthermore, FGAS demonstrates strong robustness against common audio processing attacks. Moreover, FGAS exhibits superior anti-steganalysis performance across different relative payloads; under high-capacity embedding, it achieves a classification error rate about 2% higher, indicating stronger anti-steganalysis performance than current SOTA methods.

翻译：人工智能生成内容（AIGC）的快速发展使得高保真生成音频在互联网上广泛传播，推动了音频隐写技术的发展。受益于深度学习技术的进步，当前音频隐写方案主要基于编码器-解码器网络架构。尽管这类方法能保证隐写音频具有一定程度的感知质量，但通常面临计算成本高、实现时间长以及抗隐写分析性能差的挑战。针对上述问题，我们开创性地提出了基于固定解码器网络的对抗扰动生成音频隐写方法（FGAS）。将携带秘密信息的对抗扰动嵌入到载体音频中以生成隐写音频。接收方仅需共享固定解码器的网络结构和密钥，即可从隐写音频中准确提取秘密信息。在FGAS中，我们提出了具有可选鲁棒扩展的音频对抗扰动生成（A2PG）策略，并设计了轻量级固定解码器。固定解码器确保隐藏消息的可靠提取，而对抗扰动经优化后可使隐写音频在感知和统计特性上接近载体音频，从而提升抗隐写分析性能。实验结果表明，FGAS显著提升了隐写音频质量，与当前最优方法相比，平均峰值信噪比增益超过10分贝。此外，FGAS在常见音频处理攻击下展现出强鲁棒性。同时，在不同相对嵌入容量下FGAS均表现出优越的抗隐写分析性能；在高容量嵌入条件下，其分类错误率提升约2%，表明其抗隐写分析能力优于当前最优方法。