As a multimodal medium combining images and text, memes frequently convey implicit harmful content through metaphors and humor, rendering the detection of harmful memes a complex and challenging task. Although recent studies have made progress in detection accuracy and interpretability, large-scale, high-quality datasets for harmful memes remain scarce, and current methods still struggle to capture implicit risks and nuanced semantics. Thus, we construct MemeMind, a large-scale harmful meme dataset. Aligned with the international standards and the context of internet, MemeMind provides detailed Chain-of-Thought (CoT) reasoning annotations to support fine-grained analysis of implicit intentions in memes. Based on this dataset, we further propose MemeGuard, a reasoning-oriented multimodal detection model that significantly improves both the accuracy of harmful meme detection and the interpretability of model decisions. Extensive experimental results demonstrate that MemeGuard outperforms existing state-of-the-art methods on the MemeMind dataset, establishing a solid foundation for future research in harmful meme detection.
翻译:作为一种结合图像与文本的多模态媒介,表情包常通过隐喻和幽默传达隐含的有害内容,使得有害表情包检测成为一项复杂且具有挑战性的任务。尽管近期研究在检测准确性和可解释性方面取得了进展,但用于有害表情包的大规模高质量数据集仍然稀缺,且现有方法在捕捉隐含风险与细微语义方面仍存在困难。为此,我们构建了MemeMind——一个大规模有害表情包数据集。该数据集遵循国际标准并贴合互联网语境,提供了详细的思维链推理标注,以支持对表情包中隐含意图的细粒度分析。基于此数据集,我们进一步提出了MemeGuard——一个面向推理的多模态检测模型,该模型显著提升了有害表情包检测的准确性及模型决策的可解释性。大量实验结果表明,MemeGuard在MemeMind数据集上优于现有的先进方法,为未来有害表情包检测研究奠定了坚实基础。