Multimodal Machine Learning Combining Facial Images and Clinical Texts Improves Diagnosis of Rare Genetic Diseases

Individuals with suspected rare genetic disorders often undergo multiple clinical evaluations, imaging studies, laboratory tests and genetic tests, to find a possible answer over a prolonged period of multiple years. Addressing this diagnostic odyssey thus have substantial clinical, psychosocial, and economic benefits. Many rare genetic diseases have distinctive facial features, which can be used by artificial intelligence algorithms to facilitate clinical diagnosis, in prioritizing candidate diseases to be further examined by lab tests or genetic assays, or in helping the phenotype-driven reinterpretation of genome/exome sequencing data. However, existing methods using frontal facial photo were built on conventional Convolutional Neural Networks (CNNs), rely exclusively on facial images, and cannot capture non-facial phenotypic traits and demographic information essential for guiding accurate diagnoses. Here we introduce GestaltMML, a multimodal machine learning (MML) approach solely based on the Transformer architecture. It integrates the facial images, demographic information (age, sex, ethnicity), and clinical notes of patients to improve prediction accuracy. Furthermore, we also introduce GestaltGPT, a GPT-based methodology with few-short learning capacities that exclusively harnesses textual inputs using a range of large language models (LLMs) including Llama 2, GPT-J and Falcon. We evaluated these methods on a diverse range of datasets, including 449 diseases from the GestaltMatcher Database, several in-house datasets on Beckwith-Wiedemann syndrome, Sotos syndrome, NAA10-related syndrome (neurodevelopmental syndrome) and others. Our results suggest that GestaltMML/GestaltGPT effectively incorporate multiple modalities of data, greatly narrow down candidate genetic diagnosis of rare diseases, and may facilitate the reinterpretation of genome/exome sequencing data.

翻译：疑似罕见遗传病的患者常需经历多年反复的临床评估、影像学检查、实验室检测及基因检测才能获得诊断。缩短这一诊断历程具有重要的临床、社会心理及经济学价值。许多罕见遗传病具有特征性面部表型，人工智能算法可利用这些特征辅助临床诊断，优先筛选需进一步通过实验室检测或基因检测验证的候选疾病，或协助进行表型驱动的基因组/外显子组测序数据再分析。然而，现有基于正面面部照片的方法构建于传统卷积神经网络（CNN），仅依赖面部图像，无法捕捉非面部表型特征及对准确诊断至关重要的患者人口学信息。本文提出GestaltMML——一种完全基于Transformer架构的多模态机器学习（MML）方法。该方法整合患者面部图像、人口学信息（年龄、性别、种族）及临床文本，以提升预测准确性。此外，我们同时提出GestaltGPT——一种基于GPT架构的少样本学习技术，该技术通过Llama 2、GPT-J及Falcon等多种大语言模型（LLM）纯文本输入实现诊断。我们在包含GestaltMatcher数据库449种疾病及贝-维综合征、索托斯综合征、NAA10相关综合征（神经发育综合征）等多个内部数据集上评估了上述方法。结果表明，GestaltMML/GestaltGPT能有效整合多模态数据，大幅缩小罕见遗传病候选诊断范围，并可能促进基因组/外显子组测序数据的再分析。