Linguistic steganalysis (LS) tasks aim to detect whether a text contains secret information. Existing LS methods focus on the deep-learning model design and they achieve excellent results in ideal data. However, they overlook the unique user characteristics, leading to weak performance in social networks. And a few stegos here that further complicate detection. We propose the UP4LS, a framework with the User Profile for enhancing LS in realistic scenarios. Three kinds of user attributes like writing habits are explored to build the profile. For each attribute, the specific feature extraction module is designed. The extracted features are mapped to high-dimensional user features via the deep-learning model of the method to be improved. The content feature is extracted by the language model. Then user and content features are integrated. Existing methods can improve LS results by adding the UP4LS framework without changing their deep-learning models. Experiments show that UP4LS can significantly enhance the performance of LS-task baselines in realistic scenarios, with the overall Acc increased by 25%, F1 increased by 51%, and SOTA results. The improvement is especially pronounced in fewer stegos. Additionally, UP4LS also sets the stage for the related-task SOTA methods to efficient LS.
翻译:语言隐写分析(LS)任务旨在检测文本是否包含秘密信息。现有LS方法主要关注深度学习模型设计,在理想数据上取得了优异效果。然而这些方法忽视了用户的独特性,导致在社交网络场景中性能较弱。加之隐写文本数量稀少,进一步增加了检测难度。本文提出UP4LS框架,通过构建用户画像来增强实际场景中的LS性能。我们探索了写作习惯等三类用户属性来构建画像,针对每种属性设计了专用特征提取模块。提取的特征通过待改进方法的深度学习模型映射为高维用户特征,同时通过语言模型提取内容特征,最终融合用户特征与内容特征。现有方法无需改动其深度学习模型,仅需添加UP4LS框架即可提升LS效果。实验表明,UP4LS能显著提升实际场景中LS基准方法的性能,整体准确率提升25%,F1值提升51%,并获得SOTA结果。在隐写文本较少时改进尤为显著。此外,UP4LS还为相关任务的SOTA方法实现高效LS奠定了基础。