Despite remarkable progress in steganography, embedding semantically rich, sentence-level information into carriers remains a challenging problem. In this work, we present a novel concept of Semantic Steganography, which aims to hide semantically meaningful and structured content, such as sentences or paragraphs, in cover media. Based on this concept, we present Sentence-to-Image Steganography as an instance that enables the hiding of arbitrary sentence-level messages within a cover image. To accomplish this feat, we propose S^2LM: Semantic Steganographic Language Model, which leverages large language models (LLMs) to embed high-level textual information into images. Unlike traditional bit-level approaches, S^2LM redesigns the entire pipeline, involving the LLM throughout the process to enable the hiding and recovery of arbitrary sentences. Furthermore, we establish a benchmark named Invisible Text (IVT), comprising a diverse set of sentence-level texts as secret messages to evaluate semantic steganography methods. Experimental results demonstrate that S^2LM effectively enables direct sentence recovery beyond bit-level steganography. The source code and IVT dataset will be released soon.
翻译:尽管隐写术已取得显著进展,但将语义丰富、句子级别的信息嵌入载体中仍然是一个具有挑战性的问题。在本工作中,我们提出了一种新颖的语义隐写术概念,其目标是将具有语义意义的结构化内容(如句子或段落)隐藏于载体媒介中。基于此概念,我们提出了句子到图像的隐写术作为一个实例,使得能够在载体图像中隐藏任意的句子级消息。为实现这一目标,我们提出了S^2LM:语义隐写语言模型,该模型利用大语言模型将高级文本信息嵌入图像中。与传统的比特级方法不同,S^2LM重新设计了整个流程,在整个过程中引入LLM,以实现任意句子的隐藏与恢复。此外,我们建立了一个名为Invisible Text的基准数据集,其中包含多样化的句子级文本作为秘密消息,用于评估语义隐写方法。实验结果表明,S^2LM能够有效实现超越比特级隐写术的直接句子恢复。源代码与IVT数据集即将发布。