In recent years, cross-modal reasoning (CMR), the process of understanding and reasoning across different modalities, has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics. As the deployment of AI systems becomes more ubiquitous, the demand for transparency and comprehensibility in these systems' decision-making processes has intensified. This survey delves into the realm of interpretable cross-modal reasoning (I-CMR), where the objective is not only to achieve high predictive performance but also to provide human-understandable explanations for the results. This survey presents a comprehensive overview of the typical methods with a three-level taxonomy for I-CMR. Furthermore, this survey reviews the existing CMR datasets with annotations for explanations. Finally, this survey summarizes the challenges for I-CMR and discusses potential future directions. In conclusion, this survey aims to catalyze the progress of this emerging research area by providing researchers with a panoramic and comprehensive perspective, illuminating the state of the art and discerning the opportunities. The summarized methods, datasets, and other resources are available at https://github.com/ZuyiZhou/Awesome-Interpretable-Cross-modal-Reasoning.
翻译:近年来,跨模态推理(CMR)作为一种理解并推理不同模态信息的过程,已成为从多媒体分析到医疗诊断等应用领域的关键研究方向。随着人工智能系统的广泛应用,对其决策过程的透明性与可理解性需求日益迫切。本综述聚焦可解释跨模态推理(I-CMR)领域,其目标不仅在于实现高预测性能,更在于为结果提供人类可理解的解释。本文基于三级分类体系,系统梳理了I-CMR的典型方法,并回顾了现有配备解释标注的CMR数据集。最后,本文总结了I-CMR面临的挑战并探讨了潜在发展方向。本综述旨在通过提供全景式、多视角的研究图景,揭示当前技术发展水平并识别机遇,从而推动这一新兴领域的进展。相关方法、数据集及其他资源已开源至 https://github.com/ZuyiZhou/Awesome-Interpretable-Cross-modal-Reasoning。