Social media increasingly disseminates information through mixed image text posts, but rumors often exploit subtle inconsistencies and forged content, making detection based solely on post content difficult. Deep semantic mismatch rumors, which superficially align images and texts, pose particular challenges and threaten online public opinion. Existing multimodal rumor detection methods improve cross modal modeling but suffer from limited feature extraction, noisy alignment, and inflexible fusion strategies, while ignoring external factual evidence necessary for verifying complex rumors. To address these limitations, we propose a multimodal rumor detection model enhanced with external evidence and forgery features. The model uses a ResNet34 visual encoder, a BERT text encoder, and a forgery feature module extracting frequency-domain traces and compression artifacts via Fourier transformation. BLIP-generated image descriptions bridge image and text semantic spaces. A dual contrastive learning module computes contrastive losses between text image and text description pairs, improving detection of semantic inconsistencies. A gated adaptive feature-scaling fusion mechanism dynamically adjusts multimodal fusion and reduces redundancy. Experiments on Weibo and Twitter datasets demonstrate that our model outperforms mainstream baselines in macro accuracy, recall, and F1 score.
翻译:社交媒体日益通过图文混合帖子传播信息,但谣言常利用细微的不一致性和伪造内容,使得仅基于帖子内容的检测变得困难。深度语义失配型谣言在表面上对齐图像与文本,尤其构成检测挑战并威胁网络舆论。现有多模态谣言检测方法虽改进了跨模态建模,但存在特征提取有限、对齐噪声和融合策略僵化等问题,同时忽略了验证复杂谣言所需的外部事实证据。为应对这些局限,我们提出一种融合外部证据与伪造特征增强的多模态谣言检测模型。该模型采用ResNet34视觉编码器、BERT文本编码器以及通过傅里叶变换提取频域痕迹与压缩伪影的伪造特征模块。BLIP生成的图像描述桥接图像与文本语义空间。双对比学习模块计算文本-图像与文本-描述对之间的对比损失,提升语义不一致性检测能力。门控自适应特征缩放融合机制动态调整多模态融合并降低冗余。在微博和Twitter数据集上的实验表明,本模型在宏观准确率、召回率和F1分数上均优于主流基线方法。