Linguistic steganalysis (LS) tasks aim to effectively detect stegos generated by linguistic steganography. Existing LS methods overlook the distinctive user characteristics, leading to weak performance in social networks. The limited occurrence of stegos further complicates detection. In this paper, we propose the UP4LS, a novel framework with the User Profile for enhancing LS performance. Specifically, by delving into post content, we explore user attributes like writing habits, psychological states, and focal areas, thereby building the user profile for LS. For each attribute, we design the identified feature extraction module. The extracted features are mapped to high-dimensional user features via deep-learning networks from existing methods. Then the language model is employed to extract content features. The user and content features are integrated to optimize feature representation. During the training phase, we prioritize the distribution of stegos. Experiments demonstrate that UP4LS can significantly enhance the performance of existing methods, and an overall accuracy improvement of nearly 25%. In particular, the improvement is especially pronounced with fewer stego samples. Additionally, UP4LS also sets the stage for studies on related tasks, encouraging extensive applications on LS tasks.
翻译:语言隐写分析(LS)任务旨在有效检测由语言隐写术生成的隐写文本。现有LS方法忽略了用户的独特特征,导致其在社交网络中性能较弱,而隐写文本的稀疏性进一步增加了检测难度。本文提出UP4LS——一种融合用户画像的新型框架,用于提升LS性能。具体而言,通过深入分析帖子内容,我们挖掘用户写作习惯、心理状态和关注领域等属性,从而构建面向LS的用户画像。针对每个属性,设计对应的特征提取模块,利用现有深度学习网络将提取的特征映射为高维用户特征。随后采用语言模型提取内容特征,并通过融合用户特征与内容特征优化特征表示。在训练阶段,我们优先处理隐写文本的分布问题。实验表明,UP4LS可显著提升现有方法的性能,整体准确率提升近25%。尤其在隐写样本稀少时,性能提升更为显著。此外,UP4LS也为相关任务研究奠定基础,推动LS任务的广泛应用。