Unveiling Covert Toxicity in Multimodal Data via Toxicity Association Graphs: A Graph-Based Metric and Interpretable Detection Framework

Detecting toxicity in multimodal data remains a significant challenge, as harmful meanings often lurk beneath seemingly benign individual modalities: only emerging when modalities are combined and semantic associations are activated. To address this, we propose a novel detection framework based on Toxicity Association Graphs (TAGs), which systematically model semantic associations between innocuous entities and latent toxic implications. Leveraging TAGs, we introduce the first quantifiable metric for hidden toxicity, the Multimodal Toxicity Covertness (MTC), which measures the degree of concealment in toxic multimodal expressions. By integrating our detection framework with the MTC metric, our approach enables precise identification of covert toxicity while preserving full interpretability of the decision-making process, significantly enhancing transparency in multimodal toxicity detection. To validate our method, we construct the Covert Toxic Dataset, the first benchmark specifically designed to capture high-covertness toxic multimodal instances. This dataset encodes nuanced cross-modal associations and serves as a rigorous testbed for evaluating both the proposed metric and detection framework. Extensive experiments demonstrate that our approach outperforms existing methods across both low- and high-covertness toxicity regimes, while delivering clear, interpretable, and auditable detection outcomes. Together, our contributions advance the state of the art in explainable multimodal toxicity detection and lay the foundation for future context-aware and interpretable approaches. Content Warning: This paper contains examples of toxic multimodal content that may be offensive or disturbing to some readers. Reader discretion is advised.

翻译：多模态数据中的毒性检测仍面临重大挑战，有害含义常潜藏于看似良性的单模态内容之下：仅当多模态组合且语义关联被激活时才显现。为此，我们提出一种基于毒性关联图的新型检测框架，该系统化建模无害实体与潜在毒性含义之间的语义关联。借助毒性关联图，我们首次提出可量化的隐蔽毒性度量标准——多模态毒性隐蔽度，用于衡量毒性多模态表达的隐蔽程度。通过将检测框架与多模态毒性隐蔽度度量相结合，我们的方法能够精准识别隐蔽毒性，同时保持决策过程的完全可解释性，显著提升多模态毒性检测的透明度。为验证方法有效性，我们构建了首个专门针对高隐蔽度毒性多模态实例的基准数据集——隐蔽毒性数据集。该数据集编码了细微的跨模态关联，为评估所提出的度量标准和检测框架提供了严格测试平台。大量实验表明，我们的方法在低隐蔽度与高隐蔽度毒性场景下均优于现有方法，同时提供清晰、可解释且可审计的检测结果。综合而言，我们的研究推动了可解释多模态毒性检测的技术前沿，并为未来情境感知与可解释方法奠定了基础。内容警示：本文包含可能令部分读者感到冒犯或不适的毒性多模态内容示例，建议读者谨慎阅读。