We present MedXIAOHE, a medical vision-language foundation model designed to advance general-purpose medical understanding and reasoning in real-world clinical applications. MedXIAOHE achieves state-of-the-art performance across diverse medical benchmarks and surpasses leading closed-source multimodal systems on multiple capabilities. To achieve this, we propose an entity-aware continual pretraining framework that organizes heterogeneous medical corpora to broaden knowledge coverage and reduce long-tail gaps (e.g., rare diseases). For medical expert-level reasoning and interaction, MedXIAOHE incorporates diverse medical reasoning patterns via reinforcement learning and tool-augmented agentic training, enabling multi-step diagnostic reasoning with verifiable decision traces. To improve reliability in real-world use, MedXIAOHE integrates user-preference rubrics, evidence-grounded reasoning, and low-hallucination long-form report generation, with improved adherence to medical instructions. We release this report to document our practical design choices, scaling insights, and evaluation framework, hoping to inspire further research.
翻译:我们提出了MedXIAOHE,一个旨在提升现实世界临床应用中通用医学理解与推理能力的医学视觉语言基础模型。MedXIAOHE在多样化的医学基准测试中取得了最先进的性能,并在多项能力上超越了领先的闭源多模态系统。为实现这一目标,我们提出了一种实体感知的持续预训练框架,该框架通过组织异构医学语料库来拓宽知识覆盖范围并减少长尾差距(例如罕见疾病)。为了达到医学专家级的推理与交互水平,MedXIAOHE通过强化学习和工具增强的智能体训练融入了多样化的医学推理模式,从而实现了具有可验证决策轨迹的多步骤诊断推理。为了提高在现实世界应用中的可靠性,MedXIAOHE集成了用户偏好评估准则、基于证据的推理以及低幻觉长文本报告生成能力,并增强了对医学指令的遵循度。我们发布本报告以记录我们的实用设计选择、规模化洞见和评估框架,希望能启发进一步的研究。