In this work, we examine hateful memes from three complementary angles - how to detect them, how to explain their content and how to intervene them prior to being posted - by applying a range of strategies built on top of generative AI models. To the best of our knowledge, explanation and intervention have typically been studied separately from detection, which does not reflect real-world conditions. Further, since curating large annotated datasets for meme moderation is prohibitively expensive, we propose a novel framework that leverages task-specific generative multimodal agents and the few-shot adaptability of large multimodal models to cater to different types of memes. We believe this is the first work focused on generalizable hateful meme moderation under limited data conditions, and has strong potential for deployment in real-world production scenarios. Warning: Contains potentially toxic contents.
翻译:本研究通过应用基于生成式人工智能模型的一系列策略,从三个互补角度审视仇恨梗图:如何检测它们、如何解释其内容以及如何在发布前进行干预。据我们所知,解释与干预通常与检测分开研究,这未能反映现实条件。此外,由于为梗图审核构建大规模标注数据集的成本极高,我们提出了一种新颖框架,该框架利用任务特定的生成式多模态智能体以及大型多模态模型的小样本适应能力,以适配不同类型的梗图。我们相信这是首个专注于有限数据条件下可泛化仇恨梗图审核的工作,在实际生产场景中具有强大的部署潜力。警告:包含潜在有害内容。