Read as You See: Guiding Unimodal LLMs for Low-Resource Explainable Harmful Meme Detection

Detecting harmful memes is crucial for safeguarding the integrity and harmony of online environments, yet existing detection methods are often resource-intensive, inflexible, and lacking explainability, limiting their applicability in assisting real-world web content moderation. We propose U-CoT+, a resource-efficient framework that prioritizes accessibility, flexibility and transparency in harmful meme detection by fully harnessing the capabilities of lightweight unimodal large language models (LLMs). Instead of directly prompting or fine-tuning large multimodal models (LMMs) as black-box classifiers, we avoid immediate reasoning over complex visual inputs but decouple meme content recognition from meme harmfulness analysis through a high-fidelity meme-to-text pipeline, which collaborates lightweight LMMs and LLMs to convert multimodal memes into natural language descriptions that preserve critical visual information, thus enabling text-only LLMs to "see" memes by "reading". Grounded in textual inputs, we further guide unimodal LLMs' reasoning under zero-shot Chain-of-Thoughts (CoT) prompting with targeted, interpretable, context-aware, and easily obtained human-crafted guidelines, thus providing accountable step-by-step rationales, while enabling flexible and efficient adaptation to diverse sociocultural criteria of harmfulness. Extensive experiments on seven benchmark datasets show that U-CoT+ achieves performance comparable to resource-intensive baselines, highlighting its effectiveness and potential as a scalable, explainable, and low-resource solution to support harmful meme detection.

翻译：检测有害迷因对于维护网络环境的完整性与和谐至关重要，然而现有检测方法通常资源消耗大、灵活性不足且缺乏可解释性，限制了其在辅助现实网络内容审核中的适用性。我们提出U-CoT+，一个资源高效的框架，通过充分利用轻量级单模态大语言模型的能力，在有害迷因检测中优先考虑可访问性、灵活性和透明度。与直接提示或微调大型多模态模型作为黑盒分类器的做法不同，我们避免对复杂视觉输入进行即时推理，而是通过一个高保真的迷因到文本转换流程，将迷因内容识别与迷因危害性分析解耦。该流程协同轻量级多模态模型与大语言模型，将多模态迷因转换为保留关键视觉信息的自然语言描述，从而使纯文本大语言模型能够通过“阅读”来“看见”迷因。基于文本输入，我们进一步在零样本思维链提示下，使用具有针对性、可解释性、上下文感知且易于获取的人工制定指导原则，引导单模态大语言模型进行推理。这不仅提供了可追溯的逐步推理依据，同时实现了对多样化社会文化危害标准的高效灵活适应。在七个基准数据集上的大量实验表明，U-CoT+实现了与资源密集型基线方法相当的性能，凸显了其作为一种可扩展、可解释且低资源的解决方案，在支持有害迷因检测方面的有效性和潜力。