We present OpenMedQ, a medical vision-language model pretrained on the broadest fully-open medical mix to date: 14 datasets totaling ~3.35M pretraining samples spanning pathology, radiology, microscopy, and text-only clinical QA. OpenMedQ reaches state-of-the-art BLEU-1 on PathVQA (75.9), beating Med-PaLM M variants up to 562B parameters (~80x larger), and matches the best reported VQA-MED BLEU-1 (64.5). Its vision encoder, transferred to 8 unseen medical classification benchmarks under an identical downstream recipe, obtains the highest average macro-F1 (0.757) among BiomedCLIP (0.745), PMC-CLIP (0.745), PubMedCLIP (0.746), and a from-scratch baseline (0.616). We release our code and an interactive demo is publicly available as a reproducible baseline for the community.
翻译:我们提出OpenMedQ,这是一个在迄今为止最广泛的完全开放医学数据混合集上预训练的医学视觉语言模型:包含14个数据集,总计约335万个预训练样本,涵盖病理学、放射学、显微镜及纯文本临床问答领域。OpenMedQ在PathVQA上达到了当时最优的BLEU-1分数(75.9),超越了参数规模高达562B(约80倍于OpenMedQ)的Med-PaLM M变体,并在VQA-MED的BLEU-1指标上追平了最佳报告结果(64.5)。其视觉编码器在相同的下游流程下迁移至8个未见过的医学分类基准测试,获得了最高的平均宏F1分数(0.757),超越了BiomedCLIP(0.745)、PMC-CLIP(0.745)、PubMedCLIP(0.746)以及从零训练的基线模型(0.616)。我们已开源代码,并提供交互式演示,作为社区可复现的基线。