The proliferation of Internet memes in the age of social media necessitates effective identification of harmful ones. Due to the dynamic nature of memes, existing data-driven models may struggle in low-resource scenarios where only a few labeled examples are available. In this paper, we propose an agency-driven framework for low-resource harmful meme detection, employing both outward and inward analysis with few-shot annotated samples. Inspired by the powerful capacity of Large Multimodal Models (LMMs) on multimodal reasoning, we first retrieve relative memes with annotations to leverage label information as auxiliary signals for the LMM agent. Then, we elicit knowledge-revising behavior within the LMM agent to derive well-generalized insights into meme harmfulness. By combining these strategies, our approach enables dialectical reasoning over intricate and implicit harm-indicative patterns. Extensive experiments conducted on three meme datasets demonstrate that our proposed approach achieves superior performance than state-of-the-art methods on the low-resource harmful meme detection task.
翻译:在社交媒体时代,互联网梗图的激增使得有效识别有害内容成为必要。由于梗图具有动态演变的特性,现有数据驱动模型在仅能获取少量标注样本的低资源场景中可能面临挑战。本文提出一种基于智能体驱动的低资源有害梗图检测框架,通过少量标注样本同时进行外部分析与内省推理。受大型多模态模型(LMMs)在多模态推理方面的强大能力启发,我们首先检索带标注的相关梗图,将标签信息作为辅助信号输入LMM智能体。随后,我们激发LMM智能体内部的知识修正行为,从而获得对梗图危害性的泛化认知。通过结合这些策略,我们的方法能够对复杂隐晦的危害指示模式进行辩证推理。在三个梗图数据集上的大量实验表明,所提出的方法在低资源有害梗图检测任务上取得了优于现有最先进方法的性能。