Memes, as a widely used mode of online communication, often serve as vehicles for spreading harmful content. However, limitations in data accessibility and the high costs of dataset curation hinder the development of robust meme moderation systems. To address this challenge, in this work, we introduce a first-of-its-kind dataset - TOXICTAGS consisting of 6,300 real-world meme-based posts annotated in two stages: (i) binary classification into toxic and normal, and (ii) fine-grained labelling of toxic memes as hateful, dangerous, or offensive. A key feature of this dataset is that it is enriched with auxiliary metadata of socially relevant tags, enhancing the context of each meme. In addition, we propose a novel entropy guided multi-tasking framework - STEMTOX - that integrates the generation of socially grounded tags with a robust classification framework. Experimental results show that incorporating these tags substantially enhances the performance of state-of-the-art VLMs in toxicity detection tasks. Our contributions offer a novel and scalable foundation for improved content moderation in multimodal online environments. Warning: Contains potentially toxic contents.
翻译:表情包作为一种广泛使用的在线交流形式,常成为传播有害内容的载体。然而,数据可获取性的限制与数据集构建的高昂成本,阻碍了鲁棒的表情包内容审核系统的发展。为应对这一挑战,本研究首次引入一个包含6,300个真实世界表情包帖子的数据集——TOXICTAGS,该数据集通过两阶段标注构建:(i) 毒性内容与正常内容的二元分类;(ii) 将毒性表情包细粒度标注为仇恨性、危险性或冒犯性。该数据集的一个关键特征是富含具有社会相关性的辅助元数据标签,从而增强了每个表情包的上下文信息。此外,我们提出了一种新颖的熵引导多任务框架——STEMTOX——该框架将基于社会背景的标签生成与鲁棒的分类框架相集成。实验结果表明,引入这些标签显著提升了前沿视觉语言模型在毒性检测任务中的性能。我们的贡献为多模态在线环境中改进内容审核提供了新颖且可扩展的基础。警告:内容可能包含毒性信息。