Artificial intelligence (AI) has rapidly developed through advancements in computational power and the growth of massive datasets. However, this progress has also heightened challenges in interpreting the "black-box" nature of AI models. To address these concerns, eXplainable AI (XAI) has emerged with a focus on transparency and interpretability to enhance human understanding and trust in AI decision-making processes. In the context of multimodal data fusion and complex reasoning scenarios, the proposal of Multimodal eXplainable AI (MXAI) integrates multiple modalities for prediction and explanation tasks. Meanwhile, the advent of Large Language Models (LLMs) has led to remarkable breakthroughs in natural language processing, yet their complexity has further exacerbated the issue of MXAI. To gain key insights into the development of MXAI methods and provide crucial guidance for building more transparent, fair, and trustworthy AI systems, we review the MXAI methods from a historical perspective and categorize them across four eras: traditional machine learning, deep learning, discriminative foundation models, and generative LLMs. We also review evaluation metrics and datasets used in MXAI research, concluding with a discussion of future challenges and directions. A project related to this review has been created at https://github.com/ShilinSun/mxai_review.
翻译:人工智能(AI)在计算能力的进步和海量数据集的推动下迅速发展。然而,这一进展也加剧了AI模型“黑箱”特性在解释方面所面临的挑战。为解决这些问题,可解释人工智能(XAI)应运而生,其关注透明度和可解释性,旨在增强人类对AI决策过程的理解与信任。在多模态数据融合与复杂推理场景下,多模态可解释人工智能(MXAI)的提出整合了多种模态以完成预测与解释任务。与此同时,大语言模型(LLMs)的出现为自然语言处理领域带来了显著突破,但其复杂性进一步加剧了MXAI所面临的问题。为深入理解MXAI方法的发展脉络,并为构建更透明、公平、可信的AI系统提供关键指导,本文从历史视角回顾了MXAI方法,并将其划分为四个时代:传统机器学习、深度学习、判别式基础模型以及生成式大语言模型。我们还综述了MXAI研究中使用的评估指标与数据集,最后探讨了未来的挑战与发展方向。与本综述相关的项目已创建于 https://github.com/ShilinSun/mxai_review。