In the face of escalating surveillance and censorship within the cyberspace, the sanctity of personal privacy has come under siege, necessitating the development of steganography, which offers a way to securely hide messages within innocent-looking texts. Previous methods alternate the texts to hide private massages, which is not secure. Large Language Models (LLMs) provide high-quality and explicit distribution, which is an available mathematical tool for secure steganography methods. However, existing attempts fail to achieve high capacity, time efficiency and correctness simultaneously, and their strongly coupling designs leave little room for refining them to achieve better performance. To provide a secure, high-capacity and efficient steganography method, we introduce ShiMer. Specifically, ShiMer pseudorandomly shifts the probability interval of the LLM's distribution to obtain a private distribution, and samples a token according to the private bits. ShiMer produced steganographic texts are indistinguishable in quality from the normal texts directly generated by the language model. To further enhance the capacity of ShiMer, we design a reordering algorithm to minimize the occurrence of interval splitting during decoding phase. Experimental results indicate that our method achieves the highest capacity and efficiency among existing secure steganography techniques.
翻译:面对网络空间中日益加剧的监控与审查,个人隐私的完整性正受到威胁,这推动了隐写术的发展,该技术能够将信息安全地隐藏在看似无害的文本中。先前的方法通过修改文本来隐藏私密信息,其安全性不足。大语言模型(LLMs)提供了高质量且显式的概率分布,为安全的隐写方法提供了可用的数学工具。然而,现有尝试未能同时实现高容量、时间效率与正确性,且其强耦合的设计为性能优化留下的空间有限。为提供一种安全、高容量且高效的隐写方法,我们提出了ShiMer。具体而言,ShiMer对大语言模型分布的概率区间进行伪随机移位,以获得私有分布,并根据私有比特流采样生成词汇单元。ShiMer生成的隐写文本在质量上与语言模型直接生成的正常文本无法区分。为进一步提升ShiMer的容量,我们设计了一种重排序算法,以最小化解码阶段区间分裂的发生。实验结果表明,我们的方法在现有安全隐写技术中实现了最高的容量与效率。