The rapid development of image generation models has facilitated the widespread dissemination of generated images on social networks, creating favorable conditions for provably secure image steganography. However, existing methods face issues such as low quality of generated images and lack of semantic control in the generation process. To leverage provably secure steganography with more effective and high-performance image generation models, and to ensure that stego images can accurately extract secret messages even after being uploaded to social networks and subjected to lossy processing such as JPEG compression, we propose a high-quality, provably secure, and robust image steganography method based on state-of-the-art autoregressive (AR) image generation models using Vector-Quantized (VQ) tokenizers. Additionally, we employ a cross-modal error-correction framework that generates stego text from stego images to aid in restoring lossy images, ultimately enabling the extraction of secret messages embedded within the images. Extensive experiments have demonstrated that the proposed method provides advantages in stego quality, embedding capacity, and robustness, while ensuring provable undetectability.
翻译:图像生成模型的快速发展促进了生成图像在社交网络上的广泛传播,为可证明安全的图像隐写术创造了有利条件。然而,现有方法面临生成图像质量低、生成过程缺乏语义控制等问题。为了利用可证明安全的隐写术结合更有效和高性能的图像生成模型,并确保隐写图像在上传至社交网络并经受JPEG压缩等有损处理后仍能准确提取秘密信息,我们提出了一种基于最先进的自回归(AR)图像生成模型(使用向量量化(VQ)分词器)的高质量、可证明安全且鲁棒的图像隐写方法。此外,我们采用了一种跨模态纠错框架,该框架从隐写图像生成隐写文本,以辅助恢复有损图像,最终实现嵌入图像内秘密信息的提取。大量实验证明,所提方法在隐写质量、嵌入容量和鲁棒性方面均具有优势,同时确保了可证明的不可检测性。