The exponential growth of social media has profoundly transformed how information is created, disseminated, and absorbed, exceeding any precedent in the digital age. Regrettably, this explosion has also spawned a significant increase in the online abuse of memes. Evaluating the negative impact of memes is notably challenging, owing to their often subtle and implicit meanings, which are not directly conveyed through the overt text and imagery. In light of this, large multimodal models (LMMs) have emerged as a focal point of interest due to their remarkable capabilities in handling diverse multimodal tasks. In response to this development, our paper aims to thoroughly examine the capacity of various LMMs (e.g. GPT-4V) to discern and respond to the nuanced aspects of social abuse manifested in memes. We introduce the comprehensive meme benchmark, GOAT-Bench, comprising over 6K varied memes encapsulating themes such as implicit hate speech, sexism, and cyberbullying, etc. Utilizing GOAT-Bench, we delve into the ability of LMMs to accurately assess hatefulness, misogyny, offensiveness, sarcasm, and harmful content. Our extensive experiments across a range of LMMs reveal that current models still exhibit a deficiency in safety awareness, showing insensitivity to various forms of implicit abuse. We posit that this shortfall represents a critical impediment to the realization of safe artificial intelligence. The GOAT-Bench and accompanying resources are publicly accessible at https://goatlmm.github.io/, contributing to ongoing research in this vital field.
翻译:社交媒体的指数级增长深刻改变了信息创建、传播与吸收的方式,其影响远超数字时代的任何先例。遗憾的是,这一爆发也导致模因在线滥用现象显著增加。评估模因的负面影响尤为困难,因其常蕴含微妙隐晦的含义,无法通过显性文本和图像直接传达。鉴于此,多模态大模型(LMMs)因其在处理多样多模态任务中的卓越能力而成为关注焦点。针对这一发展,本文旨在深入检验各类LMMs(如GPT-4V)在识别与回应模因中社会滥用的细微表现方面的能力。我们引入了综合性模因基准GOAT-Bench,包含超过6000个多样化模因,涵盖隐晦仇恨言论、性别歧视、网络欺凌等主题。利用GOAT-Bench,我们探究了LMMs准确评估恶意、厌女、冒犯性、讽刺及有害内容的能力。通过对一系列LMMs的广泛实验,我们发现当前模型仍存在安全意识缺陷,对各类隐晦滥用形式缺乏敏感性。我们认为这一缺陷是实现安全人工智能的关键障碍。GOAT-Bench及相关资源已在https://goatlmm.github.io/公开,为该重要领域的持续研究提供支持。