In the era of Large Language Models (LLMs), generative linguistic steganography has become a prevalent technique for hiding information within model-generated texts. However, traditional steganography methods struggle to effectively align steganographic texts with original model-generated texts due to the lower entropy of the predicted probability distribution of LLMs. This results in a decrease in embedding capacity and poses challenges for decoding stegos in real-world communication channels. To address these challenges, we propose a semantic steganography framework based on LLMs, which construct a semantic space and map secret messages onto this space using ontology-entity trees. This framework offers robustness and reliability for transmission in complex channels, as well as resistance to text rendering and word blocking. Additionally, the stegos generated by our framework are indistinguishable from the covers and achieve a higher embedding capacity compared to state-of-the-art steganography methods, while producing higher quality stegos.
翻译:在大语言模型(LLM)时代,生成式语言隐写术已成为在模型生成文本中隐藏信息的常用技术。然而,由于LLM预测概率分布的熵值较低,传统隐写方法难以有效对齐隐写文本与原始模型生成文本。这导致嵌入容量下降,并为现实通信信道中的隐写解码带来挑战。为解决这些问题,我们提出了一种基于LLM的语义隐写框架,该框架通过本体-实体树构建语义空间并将秘密信息映射至该空间。该框架为复杂信道中的传输提供了鲁棒性与可靠性,同时具备抗文本渲染和词汇阻断的能力。此外,本框架生成的隐写文本与载体文本不可区分,相比现有先进隐写方法实现了更高的嵌入容量,并能生成更高质量的隐写文本。