In the current context where online platforms have been effectively weaponized in a variety of geo-political events and social issues, Internet memes make fair content moderation at scale even more difficult. Existing work on meme classification and tracking has focused on black-box methods that do not explicitly consider the semantics of the memes or the context of their creation. In this paper, we pursue a modular and explainable architecture for Internet meme understanding. We design and implement multimodal classification methods that perform example- and prototype-based reasoning over training cases, while leveraging both textual and visual SOTA models to represent the individual cases. We study the relevance of our modular and explainable models in detecting harmful memes on two existing tasks: Hate Speech Detection and Misogyny Classification. We compare the performance between example- and prototype-based methods, and between text, vision, and multimodal models, across different categories of harmfulness (e.g., stereotype and objectification). We devise a user-friendly interface that facilitates the comparative analysis of examples retrieved by all of our models for any given meme, informing the community about the strengths and limitations of these explainable methods.
翻译:在当前网络平台已被有效武器化于各类地缘政治事件和社会问题的背景下,网络迷因使得大规模内容审核变得更加困难。现有关于迷因分类与追踪的研究主要依赖于黑箱方法,未能明确考虑迷因的语义或其创作语境。本文提出了一种模块化且可解释的网络迷因理解架构。我们设计并实现了多模态分类方法,能够基于训练案例进行示例驱动和原型驱动的推理,同时利用文本与视觉领域最先进的模型来表征单个案例。我们研究了可解释模块化模型在两项现有任务——仇恨言论检测与厌女情绪分类——中识别有害迷因的相关性。通过对比示例驱动方法与原型驱动方法,以及文本、视觉与多模态模型在不同有害性类别(如刻板印象与物化)上的表现,我们开发了用户友好型界面,便于对任意给定迷因所检索到的所有模型示例进行对比分析,从而向学界揭示这些可解释方法的优势与局限性。