Toxicity identification in online multimodal environments remains a challenging task due to the complexity of contextual connections across modalities (e.g., textual and visual). In this paper, we propose a novel framework that integrates Knowledge Distillation (KD) from Large Visual Language Models (LVLMs) and knowledge infusion to enhance the performance of toxicity detection in hateful memes. Our approach extracts sub-knowledge graphs from ConceptNet, a large-scale commonsense Knowledge Graph (KG) to be infused within a compact VLM framework. The relational context between toxic phrases in captions and memes, as well as visual concepts in memes enhance the model's reasoning capabilities. Experimental results from our study on two hate speech benchmark datasets demonstrate superior performance over the state-of-the-art baselines across AU-ROC, F1, and Recall with improvements of 1.1%, 7%, and 35%, respectively. Given the contextual complexity of the toxicity detection task, our approach showcases the significance of learning from both explicit (i.e. KG) as well as implicit (i.e. LVLMs) contextual cues incorporated through a hybrid neurosymbolic approach. This is crucial for real-world applications where accurate and scalable recognition of toxic content is critical for creating safer online environments.
翻译:在线多模态环境中的毒性识别任务仍然具有挑战性,这源于跨模态(如文本与视觉)上下文关联的复杂性。本文提出一种新颖的框架,该框架整合了来自大型视觉语言模型的知识蒸馏与知识注入技术,以提升仇恨性模因的毒性检测性能。我们的方法从大规模常识知识图谱ConceptNet中提取子知识图谱,并将其注入紧凑的视觉语言模型框架中。标题与模因中的毒性短语之间以及模因中视觉概念之间的关联上下文,增强了模型的推理能力。在两个仇恨言论基准数据集上的实验结果表明,我们的方法在AU-ROC、F1分数和召回率上均优于现有最先进的基线模型,分别提升了1.1%、7%和35%。鉴于毒性检测任务的上下文复杂性,我们的方法展示了通过混合神经符号方法,从显式(即知识图谱)和隐式(即大型视觉语言模型)上下文线索中学习的重要性。这对于现实世界的应用至关重要,在这些应用中,准确且可扩展的有害内容识别对于创建更安全的在线环境至关重要。