面向胎儿超声解读的认知感知视觉-语言基础模型 (Epistemic-aware Vision-Language Foundation Model for Fetal Ultrasound Interpretation)

Recent medical vision-language models have shown promise on tasks such as VQA, report generation, and anomaly detection. However, most are adapted to structured adult imaging and underperform in fetal ultrasound, which poses challenges of multi-view image reasoning, numerous diseases, and image diversity. To bridge this gap, we introduce FetalMind, a medical AI system tailored to fetal ultrasound for both report generation and diagnosis. Guided by clinical workflow, we propose Salient Epistemic Disentanglement (SED), which injects an expert-curated bipartite graph into the model to decouple view-disease associations and to steer preference selection along clinically faithful steps via reinforcement learning. This design mitigates variability across diseases and heterogeneity across views, reducing learning bottlenecks while aligning the model's inference with obstetric practice. To train FetalMind at scale, we curate FetalSigma-1M dataset, the first large-scale fetal ultrasound report corpus, comprising 20K reports from twelve medical centers, addressing the scarcity of domain data. Extensive experiments show that FetalMind outperforms open- and closed-source baselines across all gestational stages, achieving +14% average gains and +61.2% higher accuracy on critical conditions while remaining efficient, stable, and scalable. Project Page: https://hexiao0275.github.io/FetalMind.

翻译：近期医学视觉-语言模型在视觉问答、报告生成及异常检测等任务中展现出潜力。然而，多数模型适配于结构化成人影像，在胎儿超声领域表现欠佳，后者面临多切面图像推理、疾病种类繁多及图像多样性等挑战。为弥补这一差距，我们提出了FetalMind——一个专为胎儿超声设计的医学人工智能系统，兼具报告生成与诊断功能。在临床工作流程的指导下，我们提出了显著认知解耦方法，该方法通过将专家构建的二部图注入模型，以解耦切面与疾病的关联，并借助强化学习引导模型沿临床可信步骤进行偏好选择。该设计缓解了疾病间的变异性和切面间的异质性，在降低学习瓶颈的同时，使模型推理过程与产科实践保持一致。为大规模训练FetalMind，我们构建了首个大规模胎儿超声报告语料库FetalSigma-1M，该数据集包含来自十二个医疗中心的2万份报告，解决了领域数据稀缺的问题。大量实验表明，FetalMind在所有孕周阶段均优于开源及闭源基线模型，平均性能提升达+14%，在关键病症上的准确率提高+61.2%，同时保持高效、稳定与可扩展性。项目页面：https://hexiao0275.github.io/FetalMind。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日