Toxicity identification in online multimodal environments remains a challenging task due to the complexity of contextual connections across modalities (e.g., textual and visual). In this paper, we propose a novel framework that integrates Knowledge Distillation (KD) from Large Visual Language Models (LVLMs) and knowledge infusion to enhance the performance of toxicity detection in hateful memes. Our approach extracts sub-knowledge graphs from ConceptNet, a large-scale commonsense Knowledge Graph (KG) to be infused within a compact VLM framework. The relational context between toxic phrases in captions and memes, as well as visual concepts in memes enhance the model's reasoning capabilities. Experimental results from our study on two hate speech benchmark datasets demonstrate superior performance over the state-of-the-art baselines across AU-ROC, F1, and Recall with improvements of 1.1%, 7%, and 35%, respectively. Given the contextual complexity of the toxicity detection task, our approach showcases the significance of learning from both explicit (i.e. KG) as well as implicit (i.e. LVLMs) contextual cues incorporated through a hybrid neurosymbolic approach. This is crucial for real-world applications where accurate and scalable recognition of toxic content is critical for creating safer online environments.
翻译:在线多模态环境中的毒性识别,由于跨模态(例如文本与视觉)上下文关联的复杂性,仍然是一项具有挑战性的任务。本文提出一种新颖的框架,该框架集成了来自大型视觉语言模型的知识蒸馏以及知识注入,以增强对仇恨表情包的毒性检测性能。我们的方法从大规模常识知识图谱ConceptNet中提取子知识图谱,并将其注入到一个紧凑的视觉语言模型框架中。字幕和表情包中带有毒性短语之间以及表情包中视觉概念之间的关联上下文,增强了模型的推理能力。我们在两个仇恨言论基准数据集上的实验结果表明,该方法在AU-ROC、F1和召回率指标上均优于现有最先进的基线模型,分别提升了1.1%、7%和35%。鉴于毒性检测任务的上下文复杂性,我们的方法展示了通过一种混合神经符号方法,融合显式(即知识图谱)和隐式(即大型视觉语言模型)上下文线索进行学习的重要性。这对于现实世界的应用至关重要,在这些应用中,准确且可扩展地识别有毒内容对于创建更安全的在线环境至关重要。