Linguistic Steganography (LS) tasks aim to generate steganographic text (stego) based on secret information. Only authorized recipients can perceive the existence of the stegos and extract secrets, thereby preserving privacy. However, existing LS methods do not consider the controllable generation of stegos containing specific discourses such as style, genre, and theme. And they are difficult to simulate high-quality natural texts. As a result, the stegos are easily perceived and detectable, compromising covert communication. This paper proposes the LLsM, the first LS work with the Large Language Model (LLM). Regarding open-source LLMs, we reconstruct the token generator of LLM to the "stego generator" so that it can control the generation of stego based on the secret. In this "stego generator", the candidate pool is encoded by range coding, and the adjustment factor for the interval length is also given. The secret determines the interval, thereby determining the next token. This better simulates the distribution of natural texts and controls the adjustment of the embedding rate. In addition, we preliminarily built an LLsM-c architecture for closed-source LLMs. It encodes discourse to obtain high-quality prompts containing discourse based on secrets, and generates pure natural texts containing discourse. Experiments show that LLsM performs superior to prevalent LS and related-task baselines regarding various kinds of concealment and anti-steganalysis. LLsM's MAUVE surpasses baselines by 60%-80% and anti-steganalysis exceeds baselines by 20%-30%. Notably, LLsM can also generate longer stegos with high quality, showing its advantages in understanding and coherence.
翻译:语言隐写(LS)任务旨在根据秘密信息生成隐写文本(stego)。只有授权接收者能感知stego的存在并提取秘密,从而保护隐私。然而,现有LS方法未考虑包含特定话语(如风格、体裁、主题)的stego的可控生成,且难以模拟高质量自然文本。因此,stego易被感知和检测,导致隐蔽通信受损。本文提出LLsM,这是首个基于大语言模型(LLM)的LS研究工作。针对开源LLMs,我们将LLM的词元生成器重构为“stego生成器”,使其能基于秘密控制stego的生成。在该“stego生成器”中,候选池通过区间编码进行编码,并给出区间长度的调节因子。秘密决定区间,进而决定下一词元。这更好地模拟了自然文本分布,并控制了嵌入率的调节。此外,我们初步构建了面向闭源LLMs的LLsM-c架构,它通过编码话语获得包含基于秘密的话语的高质量提示,并生成包含话语的纯自然文本。实验表明,LLsM在多种隐蔽性和反隐写分析方面优于主流LS及相关任务基线。LLsM的MAUVE指标超过基线60%-80%,反隐写分析能力超过基线20%-30%。值得注意的是,LLsM还能生成长且高质量的stego,展现出其在理解与连贯性方面的优势。