Internet memes have emerged as a popular multimodal medium, yet they are increasingly weaponized to convey harmful opinions through subtle rhetorical devices like irony and metaphor. Existing detection approaches, including Multimodal Large Language Model (MLLM)-based techniques, struggle with these implicit expressions, leading to frequent misjudgments. This paper introduces PatMD, a novel approach that detects harmful memes by learning from and proactively mitigating these potential misjudgment risks. Our core idea is to move beyond superficial content-level matching and instead identify the underlying misjudgment risk patterns, proactively guiding the MLLMs to avoid known misjudgment pitfalls. We first construct a knowledge base where each meme is deconstructed into a misjudgment risk pattern explaining why it might be misjudged, either overlooking harmful undertones (false negative) or overinterpreting benign content (false positive). For a given target meme, PatMD retrieves relevant patterns and utilizes them to dynamically guide the MLLM's reasoning. Experiments on a benchmark of 6,626 memes across 5 harmful detection tasks show that PatMD outperforms state-of-the-art baselines, achieving an average of 8.30% improvement in F1-score and 7.71% improvement in accuracy, while exhibiting consistent robustness on unseen and adversarial memes.
翻译:互联网模因已成为一种流行的多模态媒介,然而它们通过反讽和隐喻等微妙修辞手法,日益被武器化用于传达有害观点。现有检测方法,包括基于多模态大语言模型(MLLM)的技术,难以应对这些隐晦表达,导致频繁误判。本文提出PatMD,一种通过学习并主动缓解这些潜在误判风险来检测有害模因的新方法。我们的核心思想是超越浅层的内容级匹配,转而识别底层的误判风险模式,主动引导MLLM避开已知的误判陷阱。我们首先构建一个知识库,其中每个模因被解构为一个误判风险模式,用以解释其为何可能被误判:要么忽略有害弦外之音(假阴性),要么过度解读良性内容(假阳性)。对于给定的目标模因,PatMD检索相关模式,并利用它们动态引导MLLM的推理过程。在涵盖5项有害检测任务、共6,626个模因的基准测试上的实验表明,PatMD优于最新基线方法,平均F1分数提升8.30%,准确率提升7.71%,同时在对未见过的及对抗性模因上展现出稳定的鲁棒性。