ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

Shawn Xu,Lin Yang,Christopher Kelly,Marcin Sieniek,Timo Kohlberger,Martin Ma,Wei-Hung Weng,Atilla Kiraly,Sahar Kazemzadeh,Zakkai Melamed,Jungyeon Park,Patricia Strachan,Yun Liu,Chuck Lau,Preeti Singh,Christina Chen,Mozziyar Etemadi,Sreenivasa Raju Kalidindi,Yossi Matias,Katherine Chou,Greg S. Corrado,Shravya Shetty,Daniel Tse,Shruthi Prabhakara,Daniel Golden,Rory Pilgrim,Krish Eswaran,Andrew Sellergren

In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI.

翻译：本文提出一种名为ELIXR（Embeddings for Language/Image-aligned X-Rays）的方法，该方法利用与语言对齐的图像编码器，将其整合或嫁接至固定的大型语言模型PaLM 2上，从而执行广泛的胸部X射线任务。我们使用MIMIC-CXR数据集中与自由文本放射学报告配对的图像，训练这种轻量级适配器架构。ELIXR在零样本胸部X射线分类（13种病灶平均AUC为0.850）、数据高效型胸部X射线分类（使用1%（约2200张图像）和10%（约22000张图像）训练数据时，五种病灶（肺不张、心脏增大、实变、胸腔积液和肺水肿）平均AUC分别为0.893和0.898）以及语义搜索（19个查询的归一化折损累计增益（NDCG）为0.76，其中12个查询实现完美检索）任务上均达到最优性能。与包括有监督对比学习（SupCon）在内的现有数据高效方法相比，ELIXR达到相似性能所需数据量少两个数量级。ELIXR在胸部X射线视觉语言任务中也展现出潜力，在视觉问答和报告质量保证任务上的总体准确率分别达到58.7%和62.5%。这些结果表明ELIXR是一种稳健且通用的胸部X射线人工智能方法。