Towards Next-Generation Steganalysis: LLMs Unleash the Power of Detecting Steganography

Linguistic steganography provides convenient implementation to hide messages, particularly with the emergence of AI generation technology. The potential abuse of this technology raises security concerns within societies, calling for powerful linguistic steganalysis to detect carrier containing steganographic messages. Existing methods are limited to finding distribution differences between steganographic texts and normal texts from the aspect of symbolic statistics. However, the distribution differences of both kinds of texts are hard to build precisely, which heavily hurts the detection ability of the existing methods in realistic scenarios. To seek a feasible way to construct practical steganalysis in real world, this paper propose to employ human-like text processing abilities of large language models (LLMs) to realize the difference from the aspect of human perception, addition to traditional statistic aspect. Specifically, we systematically investigate the performance of LLMs in this task by modeling it as a generative paradigm, instead of traditional classification paradigm. Extensive experiment results reveal that generative LLMs exhibit significant advantages in linguistic steganalysis and demonstrate performance trends distinct from traditional approaches. Results also reveal that LLMs outperform existing baselines by a wide margin, and the domain-agnostic ability of LLMs makes it possible to train a generic steganalysis model (Both codes and trained models are openly available in https://github.com/ba0z1/Linguistic-Steganalysis-with-LLMs).

翻译：语言隐写术通过隐藏消息提供了便捷的实现方式，尤其是在AI生成技术兴起的背景下。该技术的潜在滥用引发了社会中的安全担忧，亟需强大的语言隐写分析来检测包含隐写消息的载体。现有方法局限于从符号统计角度发现隐写文本与正常文本之间的分布差异。然而，这两类文本的分布差异难以精确构建，严重损害了现有方法在现实场景中的检测能力。为寻求构建实用隐写分析的可行方案，本文提出利用大语言模型（LLMs）的类人文本处理能力，在传统统计维度之外，从人类感知角度实现差异识别。具体而言，我们通过将任务建模为生成范式而非传统分类范式，系统研究了LLMs在此任务中的表现。大量实验结果表明，生成式LLMs在语言隐写分析中展现出显著优势，并呈现出与传统方法不同的性能趋势。结果还显示，LLMs以较大优势超越现有基准模型，其领域无关能力使得训练通用隐写分析模型成为可能（相关代码与训练模型已在https://github.com/ba0z1/Linguistic-Steganalysis-with-LLMs开源）。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日