In the health domain, decisions are often based on different data modalities. Thus, when creating prediction models, multimodal fusion approaches that can extract and combine relevant features from different data modalities, can be highly beneficial. Furthermore, it is important to understand how each modality impacts the final prediction, especially in high-stake domains, so that these models can be used in a trustworthy and responsible manner. We propose MultiFIX: a new interpretability-focused multimodal data fusion pipeline that explicitly induces separate features from different data types that can subsequently be combined to make a final prediction. An end-to-end deep learning architecture is used to train a predictive model and extract representative features of each modality. Each part of the model is then explained using explainable artificial intelligence techniques. Attention maps are used to highlight important regions in image inputs. Inherently interpretable symbolic expressions, learned with GP-GOMEA, are used to describe the contribution of tabular inputs. The fusion of the extracted features to predict the target label is also replaced by a symbolic expression, learned with GP-GOMEA. Results on synthetic problems demonstrate the strengths and limitations of MultiFIX. Lastly, we apply MultiFIX to a publicly available dataset for the detection of malignant skin lesions.
翻译:在健康领域,决策往往基于多种数据模态。因此,在构建预测模型时,能够从不同数据模态中提取并整合相关特征的多模态融合方法具有重要价值。此外,理解各模态对最终预测的影响至关重要,特别是在高风险领域,以确保模型能够以可信赖且负责任的方式使用。我们提出MultiFIX:一种面向可解释性的新型多模态数据融合流程,该流程明确地从不同数据类型中诱导出独立特征,随后可将这些特征组合以实现最终预测。采用端到端深度学习架构训练预测模型,并提取各模态的代表性特征。随后利用可解释人工智能技术对模型的每个组件进行解释:通过注意力图突出图像输入中的重要区域;利用GP-GOMEA学习的内在可解释符号表达式描述表格输入的贡献;同时,将用于预测目标标签的提取特征融合过程替换为GP-GOMEA学习的符号表达式。基于合成问题的实验揭示了MultiFIX的优势与局限性。最后,我们将MultiFIX应用于公开的恶性皮肤病变检测数据集。