In recent years, cross-modal reasoning (CMR), the process of understanding and reasoning across different modalities, has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics. As the deployment of AI systems becomes more ubiquitous, the demand for transparency and comprehensibility in these systems' decision-making processes has intensified. This survey delves into the realm of interpretable cross-modal reasoning (I-CMR), where the objective is not only to achieve high predictive performance but also to provide human-understandable explanations for the results. This survey presents a comprehensive overview of the typical methods with a three-level taxonomy for I-CMR. Furthermore, this survey reviews the existing CMR datasets with annotations for explanations. Finally, this survey summarizes the challenges for I-CMR and discusses potential future directions. In conclusion, this survey aims to catalyze the progress of this emerging research area by providing researchers with a panoramic and comprehensive perspective, illuminating the state of the art and discerning the opportunities.
翻译:近年来,跨模态推理(CMR)作为理解和推理不同模态信息的过程,已成为一个关键领域,其应用涵盖多媒体分析到医疗诊断等诸多方面。随着人工智能系统的部署日益普及,这些系统决策过程的透明性和可理解性需求愈发强烈。本综述深入探讨了可解释跨模态推理(I-CMR)领域,其目标不仅在于实现高预测性能,还在于为结果提供人类可理解的解释。本文通过三级分类体系对I-CMR的典型方法进行了全面概述。此外,本文还回顾了现有带解释标注的CMR数据集。最后,本文总结了I-CMR面临的挑战,并讨论了未来潜在的研究方向。总之,本综述旨在通过为研究人员提供全景式、综合性的视角,阐明当前技术发展水平并辨识机遇,从而推动这一新兴研究领域的进展。