Medical Vision-Language Pretraining (MedVLP) shows promise in learning generalizable and transferable visual representations from paired and unpaired medical images and reports. MedVLP can provide useful features to downstream tasks and facilitate adapting task-specific models to new setups using fewer examples. However, existing MedVLP methods often differ in terms of datasets, preprocessing, and finetuning implementations. This pose great challenges in evaluating how well a MedVLP method generalizes to various clinically-relevant tasks due to the lack of unified, standardized, and comprehensive benchmark. To fill this gap, we propose BenchX, a unified benchmark framework that enables head-to-head comparison and systematical analysis between MedVLP methods using public chest X-ray datasets. Specifically, BenchX is composed of three components: 1) Comprehensive datasets covering nine datasets and four medical tasks; 2) Benchmark suites to standardize data preprocessing, train-test splits, and parameter selection; 3) Unified finetuning protocols that accommodate heterogeneous MedVLP methods for consistent task adaptation in classification, segmentation, and report generation, respectively. Utilizing BenchX, we establish baselines for nine state-of-the-art MedVLP methods and found that the performance of some early MedVLP methods can be enhanced to surpass more recent ones, prompting a revisiting of the developments and conclusions from prior works in MedVLP. Our code are available at https://github.com/yangzhou12/BenchX.
翻译:医学视觉-语言预训练(MedVLP)在从成对及非成对的医学影像与报告中学习可泛化、可迁移的视觉表征方面展现出潜力。MedVLP能够为下游任务提供有用的特征,并有助于使用更少的样本使任务专用模型适应新的场景。然而,现有的MedVLP方法通常在数据集、预处理及微调实现方面存在差异。由于缺乏统一、标准化且全面的基准,这给评估MedVLP方法在各类临床相关任务上的泛化能力带来了巨大挑战。为填补这一空白,我们提出了BenchX,一个统一的基准框架,能够利用公开的胸部X光数据集对MedVLP方法进行直接比较与系统分析。具体而言,BenchX包含三个组成部分:1)涵盖九个数据集和四项医疗任务的综合数据集;2)用于标准化数据预处理、训练-测试划分及参数选择的基准套件;3)统一的微调协议,可适配异构的MedVLP方法,分别实现分类、分割和报告生成任务的一致适应。利用BenchX,我们为九种先进的MedVLP方法建立了基线,并发现一些早期MedVLP方法的性能可通过优化提升至超越近期方法,这促使我们重新审视先前MedVLP研究的发展历程与结论。我们的代码发布于 https://github.com/yangzhou12/BenchX。