STEMTOX：基于熵引导多任务学习从社交标签到细粒度毒性表情包检测 (STEMTOX: From Social Tags to Fine-Grained Toxic Meme Detection via Entropy-Guided Multi-Task Learning)

Memes, as a widely used mode of online communication, often serve as vehicles for spreading harmful content. However, limitations in data accessibility and the high costs of dataset curation hinder the development of robust meme moderation systems. To address this challenge, in this work, we introduce a first-of-its-kind dataset - TOXICTAGS consisting of 6,300 real-world meme-based posts annotated in two stages: (i) binary classification into toxic and normal, and (ii) fine-grained labelling of toxic memes as hateful, dangerous, or offensive. A key feature of this dataset is that it is enriched with auxiliary metadata of socially relevant tags, enhancing the context of each meme. In addition, we propose a novel entropy guided multi-tasking framework - STEMTOX - that integrates the generation of socially grounded tags with a robust classification framework. Experimental results show that incorporating these tags substantially enhances the performance of state-of-the-art VLMs in toxicity detection tasks. Our contributions offer a novel and scalable foundation for improved content moderation in multimodal online environments. Warning: Contains potentially toxic contents.

翻译：表情包作为一种广泛使用的在线交流形式，常成为传播有害内容的载体。然而，数据可获取性的限制与数据集构建的高昂成本，阻碍了鲁棒的表情包内容审核系统的发展。为应对这一挑战，本研究首次引入一个包含6,300个真实世界表情包帖子的数据集——TOXICTAGS，该数据集通过两阶段标注构建：(i) 毒性内容与正常内容的二元分类；(ii) 将毒性表情包细粒度标注为仇恨性、危险性或冒犯性。该数据集的一个关键特征是富含具有社会相关性的辅助元数据标签，从而增强了每个表情包的上下文信息。此外，我们提出了一种新颖的熵引导多任务框架——STEMTOX——该框架将基于社会背景的标签生成与鲁棒的分类框架相集成。实验结果表明，引入这些标签显著提升了前沿视觉语言模型在毒性检测任务中的性能。我们的贡献为多模态在线环境中改进内容审核提供了新颖且可扩展的基础。警告：内容可能包含毒性信息。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【牛津大学博士论文】从多模态数据中学习表示，258页pdf

专知会员服务

52+阅读 · 2024年7月28日

《深度学习多标签学习》最新综述

专知会员服务

47+阅读 · 2024年1月31日

《深度伪造检测模型的准确性和鲁棒性》2023最新论文

专知会员服务

41+阅读 · 2023年10月29日

收藏！ChatGPT数据科学提示速查表，60多个数据科学任务的ChatGPT提示，78页pdf

专知会员服务

106+阅读 · 2023年4月2日