This study proposes a multi-modal fusion framework Multitrans based on the Transformer architecture and self-attention mechanism. This architecture combines the study of non-contrast computed tomography (NCCT) images and discharge diagnosis reports of patients undergoing stroke treatment, using a variety of methods based on Transformer architecture approach to predicting functional outcomes of stroke treatment. The results show that the performance of single-modal text classification is significantly better than single-modal image classification, but the effect of multi-modal combination is better than any single modality. Although the Transformer model only performs worse on imaging data, when combined with clinical meta-diagnostic information, both can learn better complementary information and make good contributions to accurately predicting stroke treatment effects..
翻译:本研究提出了一种基于Transformer架构和自注意力机制的多模态融合框架Multitrans。该架构结合了脑卒中患者治疗前的非增强计算机断层扫描(NCCT)影像及出院诊断报告,采用多种基于Transformer架构的方法来预测脑卒中治疗的功能性结果。结果表明,单模态文本分类的性能显著优于单模态影像分类,但多模态组合的效果优于任何单一模态。尽管Transformer模型在影像数据上表现欠佳,但结合临床元诊断信息后,两者能够学习到更好的互补信息,并为准确预测脑卒中治疗效果做出良好贡献。