Most convolutional neural network (CNN) based methods for skin cancer classification obtain their results using only dermatological images. Although good classification results have been shown, more accurate results can be achieved by considering the patient's metadata, which is valuable clinical information for dermatologists. Current methods only use the simple joint fusion structure (FS) and fusion modules (FMs) for the multi-modal classification methods, there still is room to increase the accuracy by exploring more advanced FS and FM. Therefore, in this paper, we design a new fusion method that combines dermatological images (dermoscopy images or clinical images) and patient metadata for skin cancer classification from the perspectives of FS and FM. First, we propose a joint-individual fusion (JIF) structure that learns the shared features of multi-modality data and preserves specific features simultaneously. Second, we introduce a fusion attention (FA) module that enhances the most relevant image and metadata features based on both the self and mutual attention mechanism to support the decision-making pipeline. We compare the proposed JIF-MMFA method with other state-of-the-art fusion methods on three different public datasets. The results show that our JIF-MMFA method improves the classification results for all tested CNN backbones and performs better than the other fusion methods on the three public datasets, demonstrating our method's effectiveness and robustness
翻译:大多数基于卷积神经网络(CNN)的皮肤癌分类方法仅使用皮肤影像获得结果。尽管已取得良好的分类效果,但结合患者的元数据(对皮肤科医生具有重要临床价值的信息)可实现更精确的结果。当前多模态分类方法仅采用简单的联合融合结构(FS)和融合模块(FM),通过探索更先进的FS与FM仍有提升准确率的空间。因此,本文从FS和FM两个角度设计了一种结合皮肤影像(皮肤镜图像或临床图像)与患者元数据的新融合方法,用于皮肤癌分类。首先,我们提出一种联合-个体融合(JIF)结构,该结构可同时学习多模态数据的共享特征并保留特定特征。其次,我们引入融合注意力(FA)模块,基于自注意力与互注意力机制增强最相关的图像与元数据特征,以支持决策流程。我们将所提出的JIF-MMFA方法与其它先进融合方法在三个公开数据集上进行比较。结果表明,我们的JIF-MMFA方法在所有测试的CNN骨干网络上均提升了分类结果,并且在三个公开数据集上表现优于其他融合方法,证明了该方法的有效性和鲁棒性。