Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

Haojun Yu,Youcheng Li,Nan Zhang,Zihan Niu,Xuantong Gong,Yanwen Luo,Quanlin Wu,Wangyan Qin,Mengyuan Zhou,Jie Han,Jia Tao,Ziwei Zhao,Di Dai,Di He,Dong Wang,Binghui Tang,Ling Huo,Qingli Zhu,Yong Wang,Liwei Wang

Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifically, we introduce a pipeline, TAILOR, that builds a knowledge-driven generative model to produce tailored synthetic data. The generative model, using 3,749 lesions as source data, can generate millions of breast-US images, especially for error-prone rare cases. The generated data can be further used to build a diagnostic model for accurate and interpretable diagnoses. In the prospective external evaluation, our diagnostic model outperforms the average performance of nine radiologists by 33.5% in specificity with the same sensitivity, improving their performance by providing predictions with an interpretable decision-making process. Moreover, on ductal carcinoma in situ (DCIS), our diagnostic model outperforms all radiologists by a large margin, with only 34 DCIS lesions in the source data. We believe that TAILOR can potentially be extended to various diseases and imaging modalities.

翻译：数据驱动的深度学习模型已展现出辅助放射科医师进行乳腺超声诊断的强大能力。然而，其有效性受限于训练数据的长尾分布，这导致模型在罕见病例上诊断不准确。本研究致力于解决利用长尾数据提升诊断模型在罕见病例上性能这一长期挑战。具体而言，我们提出了一个名为TAILOR的流程，该流程构建了一个知识驱动的生成模型以产生定制的合成数据。该生成模型以3,749个病灶作为源数据，能够生成数百万张乳腺超声图像，特别是针对易出错的罕见病例。生成的数据可进一步用于构建诊断模型，以实现精准且可解释的诊断。在前瞻性外部评估中，在保持相同敏感度的前提下，我们的诊断模型在特异度上比九位放射科医师的平均表现高出33.5%，并通过提供具有可解释决策过程的预测来提升医师的诊断水平。此外，对于导管原位癌，尽管源数据中仅有34个DCIS病灶，我们的诊断模型仍以显著优势超越了所有放射科医师的表现。我们相信TAILOR有潜力扩展到多种疾病和成像模态。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

《用于无线通信和传感的智能反射面 (IRS)》（ICC 2022）新加坡国立大学2022最新53页slides

专知会员服务

25+阅读 · 2022年11月16日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日