Warning: this paper contains content that may be offensive or upsetting. In the current context where online platforms have been effectively weaponized in a variety of geo-political events and social issues, Internet memes make fair content moderation at scale even more difficult. Existing work on meme classification and tracking has focused on black-box methods that do not explicitly consider the semantics of the memes or the context of their creation. In this paper, we pursue a modular and explainable architecture for Internet meme understanding. We design and implement multimodal classification methods that perform example- and prototype-based reasoning over training cases, while leveraging both textual and visual SOTA models to represent the individual cases. We study the relevance of our modular and explainable models in detecting harmful memes on two existing tasks: Hate Speech Detection and Misogyny Classification. We compare the performance between example- and prototype-based methods, and between text, vision, and multimodal models, across different categories of harmfulness (e.g., stereotype and objectification). We devise a user-friendly interface that facilitates the comparative analysis of examples retrieved by all of our models for any given meme, informing the community about the strengths and limitations of these explainable methods.
翻译:警告:本文包含可能令人不适或冒犯的内容。在当前网络平台被有效武器化以干预各种地缘政治事件和社会议题的背景下,网络迷因使得大规模内容公平审核变得更加困难。现有迷因分类与追踪研究主要采用黑盒方法,未明确考虑迷因的语义或其创作语境。本文提出了一种模块化、可解释的网络迷因理解架构。我们设计并实现了基于示例与原型推理的多模态分类方法,在利用文本与视觉领域最优模型表征个体案例的同时,对训练案例进行推理。我们研究了模块化可解释模型在两项现有任务(仇恨言论检测与厌女分类)中检测有害迷因的相关性。我们比较了基于示例与基于原型的方法,以及纯文本、纯视觉和多模态方法在不同有害类别(如刻板印象与物化)上的表现。我们设计了一个用户友好型界面,支持对所有模型针对任意迷因检索的示例进行对比分析,以此向学术界揭示这些可解释方法的优势与局限。