Recent medical vision-language models have shown promise on tasks such as VQA, report generation, and anomaly detection. However, most are adapted to structured adult imaging and underperform in fetal ultrasound, which poses challenges of multi-view image reasoning, numerous diseases, and image diversity. To bridge this gap, we introduce FetalMind, a medical AI system tailored to fetal ultrasound for both report generation and diagnosis. Guided by clinical workflow, we propose Salient Epistemic Disentanglement (SED), which injects an expert-curated bipartite graph into the model to decouple view-disease associations and to steer preference selection along clinically faithful steps via reinforcement learning. This design mitigates variability across diseases and heterogeneity across views, reducing learning bottlenecks while aligning the model's inference with obstetric practice. To train FetalMind at scale, we curate FetalSigma-1M dataset, the first large-scale fetal ultrasound report corpus, comprising 20K reports from twelve medical centers, addressing the scarcity of domain data. Extensive experiments show that FetalMind outperforms open- and closed-source baselines across all gestational stages, achieving +14% average gains and +61.2% higher accuracy on critical conditions while remaining efficient, stable, and scalable. Project Page: https://hexiao0275.github.io/FetalMind.
翻译:近期医学视觉-语言模型在视觉问答、报告生成及异常检测等任务中展现出潜力。然而,多数模型适配于结构化成人影像,在胎儿超声领域表现欠佳,后者面临多切面图像推理、疾病种类繁多及图像多样性等挑战。为弥补这一差距,我们提出了FetalMind——一个专为胎儿超声设计的医学人工智能系统,兼具报告生成与诊断功能。在临床工作流程的指导下,我们提出了显著认知解耦方法,该方法通过将专家构建的二部图注入模型,以解耦切面与疾病的关联,并借助强化学习引导模型沿临床可信步骤进行偏好选择。该设计缓解了疾病间的变异性和切面间的异质性,在降低学习瓶颈的同时,使模型推理过程与产科实践保持一致。为大规模训练FetalMind,我们构建了首个大规模胎儿超声报告语料库FetalSigma-1M,该数据集包含来自十二个医疗中心的2万份报告,解决了领域数据稀缺的问题。大量实验表明,FetalMind在所有孕周阶段均优于开源及闭源基线模型,平均性能提升达+14%,在关键病症上的准确率提高+61.2%,同时保持高效、稳定与可扩展性。项目页面:https://hexiao0275.github.io/FetalMind。