Biomedical multimodal assistants have the potential to unify radiology, pathology, and clinical-text reasoning, yet a critical deployment gap remains: top-performing systems are either closed-source or computationally prohibitive, precluding the on-premises deployment required for patient privacy and PHI compliance. We introduce MEDGPT-OSS, an open-weight, 20B-parameter generalist vision-language model designed to facilitate open research in clinical AI. Rather than relying on architectural complexity, MEDGPT-OSS pairs the GPT-oss language backbone with a visual front-end via a optimized, three-stage training curriculum. By progressively domain-adapting these modules through rigorous data curation and long-context multimodal alignment, we demonstrate that a 20B model can bridge the capacity gap. It successfully outperforms larger open medical models on out-of-distribution (OOD) multimodal reasoning and complex text-only clinical tasks. By unifying diverse modalities under a single instruction-following interface, MEDGPT-OSS maintains a parameter-efficient footprint fully compatible with commodity GPUs. We release the complete training recipe, open-weight checkpoints, and a rigorous evaluation harness to serve as a verifiable foundation for privacy-preserving, institution-specific clinical AI research.
翻译:生物医学多模态助手有望统一放射学、病理学和临床文本推理,但一个关键的部署鸿沟依然存在:性能顶尖的系统要么是闭源的,要么计算成本过高,无法满足患者隐私和受保护健康信息合规性所要求的本地部署。我们推出MEDGPT-OSS,这是一个开放权重的200亿参数通用视觉-语言模型,旨在促进临床人工智能的开放研究。MEDGPT-OSS并非依赖架构复杂性,而是通过优化的三阶段训练课程,将GPT-oss语言主干与视觉前端相结合。通过对这些模块进行严格的数据整理和长上下文多模态对齐,逐步实现领域适应,我们证明了一个200亿参数的模型能够弥合能力差距。它在分布外多模态推理和复杂的纯文本临床任务上,成功超越了规模更大的开放医学模型。通过将多种模态统一在单一的指令跟随接口下,MEDGPT-OSS保持了参数高效的规模,完全兼容商用GPU。我们发布了完整的训练方案、开放权重的检查点以及严格的评估工具包,旨在为保护隐私、机构特定的临床人工智能研究提供一个可验证的基础。