In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI.
翻译:本文提出一种名为ELIXR(Embeddings for Language/Image-aligned X-Rays)的方法,该方法利用与语言对齐的图像编码器,将其整合或嫁接至固定的大型语言模型PaLM 2上,从而执行广泛的胸部X射线任务。我们使用MIMIC-CXR数据集中与自由文本放射学报告配对的图像,训练这种轻量级适配器架构。ELIXR在零样本胸部X射线分类(13种病灶平均AUC为0.850)、数据高效型胸部X射线分类(使用1%(约2200张图像)和10%(约22000张图像)训练数据时,五种病灶(肺不张、心脏增大、实变、胸腔积液和肺水肿)平均AUC分别为0.893和0.898)以及语义搜索(19个查询的归一化折损累计增益(NDCG)为0.76,其中12个查询实现完美检索)任务上均达到最优性能。与包括有监督对比学习(SupCon)在内的现有数据高效方法相比,ELIXR达到相似性能所需数据量少两个数量级。ELIXR在胸部X射线视觉语言任务中也展现出潜力,在视觉问答和报告质量保证任务上的总体准确率分别达到58.7%和62.5%。这些结果表明ELIXR是一种稳健且通用的胸部X射线人工智能方法。