Recent studies have proposed models that yielded promising performance for the hateful meme classification task. Nevertheless, these proposed models do not generate interpretable explanations that uncover the underlying meaning and support the classification output. A major reason for the lack of explainable hateful meme methods is the absence of a hateful meme dataset that contains ground truth explanations for benchmarking or training. Intuitively, having such explanations can educate and assist content moderators in interpreting and removing flagged hateful memes. This paper address this research gap by introducing Hateful meme with Reasons Dataset (HatReD), which is a new multimodal hateful meme dataset annotated with the underlying hateful contextual reasons. We also define a new conditional generation task that aims to automatically generate underlying reasons to explain hateful memes and establish the baseline performance of state-of-the-art pre-trained language models on this task. We further demonstrate the usefulness of HatReD by analyzing the challenges of the new conditional generation task in explaining memes in seen and unseen domains. The dataset and benchmark models are made available here: https://github.com/Social-AI-Studio/HatRed
翻译:近期研究提出了多个模型,在仇恨表情包分类任务上取得了显著性能。然而,这些模型无法生成可解释的说明来揭示其深层含义并支持分类结果。缺乏可解释性仇恨表情包方法的主要原因,在于当前没有包含用于基准测试或训练的标注真理解释的仇恨表情包数据集。直观而言,这类解释能帮助内容审核人员理解和移除被标记的仇恨表情包。本文通过引入带有原因标注的仇恨表情包数据集(HatReD)——一个包含深层仇恨语境原因注释的新型多模态仇恨表情包数据集——来填补这一研究空白。我们还定义了一项新的条件生成任务,旨在自动生成深层原因以解释仇恨表情包,并在此任务上建立了最先进预训练语言模型的基准性能。通过分析新条件生成任务在解释已见和未见领域表情包时面临的挑战,我们进一步证明了HatReD的实用性。数据集与基准模型已公开于以下地址:https://github.com/Social-AI-Studio/HatRed