Robotic ultrasound (US) systems have shown great potential to make US examinations easier and more accurate. Recently, various machine learning techniques have been proposed to realize automatic US image interpretation for robotic US acquisition tasks. However, obtaining large amounts of real US imaging data for training is usually expensive or even unfeasible in some clinical applications. An alternative is to build a simulator to generate synthetic US data for training, but the differences between simulated and real US images may result in poor model performance. This work presents a Sim2Real framework to efficiently learn robotic US image analysis tasks based only on simulated data for real-world deployment. A style transfer module is proposed based on unsupervised contrastive learning and used as a preprocessing step to convert the real US images into the simulation style. Thereafter, a task-relevant model is designed to combine CNNs with vision transformers to generate the task-dependent prediction with improved generalization ability. We demonstrate the effectiveness of our method in an image regression task to predict the probe position based on US images in robotic transesophageal echocardiography (TEE). Our results show that using only simulated US data and a small amount of unlabelled real data for training, our method can achieve comparable performance to semi-supervised and fully supervised learning methods. Moreover, the effectiveness of our previously proposed CT-based US image simulation method is also indirectly confirmed.
翻译:机器人超声系统在提升超声检查便捷性与准确性方面展现出巨大潜力。近年来,研究人员提出多种机器学习技术以实现机器人超声采集任务中的自动图像解读。然而,获取大量真实超声影像数据用于训练通常成本高昂,在某些临床应用场景中甚至不可行。替代方案是构建仿真器生成合成超声数据用于训练,但仿真图像与真实超声图像之间的差异可能导致模型性能下降。本文提出一种Sim2Real框架,仅基于仿真数据即可高效学习机器人超声图像分析任务,并实现真实场景部署。该框架基于无监督对比学习设计风格迁移模块,作为预处理步骤将真实超声图像转换为仿真风格。随后构建任务相关模型,通过融合卷积神经网络与视觉变换器生成任务相关预测结果,并提升泛化能力。我们在机器人经食管超声心动图图像回归任务中验证了该方法——根据超声图像预测探头位置。实验结果表明,仅使用仿真超声数据与少量无标注真实数据进行训练,本方法即可达到与半监督及全监督学习方法相当的性能。此外,本研究也间接验证了我们先前提出的基于CT的超声图像仿真方法的有效性。