Hateful meme classification is a challenging multimodal task that requires complex reasoning and contextual background knowledge. Ideally, we could leverage an explicit external knowledge base to supplement contextual and cultural information in hateful memes. However, there is no known explicit external knowledge base that could provide such hate speech contextual information. To address this gap, we propose PromptHate, a simple yet effective prompt-based model that prompts pre-trained language models (PLMs) for hateful meme classification. Specifically, we construct simple prompts and provide a few in-context examples to exploit the implicit knowledge in the pre-trained RoBERTa language model for hateful meme classification. We conduct extensive experiments on two publicly available hateful and offensive meme datasets. Our experimental results show that PromptHate is able to achieve a high AUC of 90.96, outperforming state-of-the-art baselines on the hateful meme classification task. We also perform fine-grained analyses and case studies on various prompt settings and demonstrate the effectiveness of the prompts on hateful meme classification.
翻译:仇恨表情包分类是一项具有挑战性的多模态任务,需要复杂的推理能力和上下文背景知识。理想情况下,我们可以利用显式的外部知识库来补充仇恨表情包中的上下文和文化信息。然而,目前尚无已知的显式外部知识库能够提供此类仇恨言论的上下文信息。为弥补这一空白,我们提出PromptHate——一种简洁而有效的基于提示的模型,通过提示预训练语言模型(PLMs)进行仇恨表情包分类。具体而言,我们构建简单的提示并提供少量上下文示例,以挖掘预训练RoBERTa语言模型中的隐含知识,从而完成仇恨表情包分类。我们在两个公开的仇恨与攻击性表情包数据集上开展了广泛实验。实验结果表明,PromptHate在仇恨表情包分类任务中能够达到90.96的高AUC值,性能超越当前最先进的基线模型。我们还在多种提示设置下进行了细粒度分析与案例研究,验证了提示机制在仇恨表情包分类中的有效性。