Video-assisted transoral tracheal intubation (TI) necessitates using an endoscope that helps the physician insert a tracheal tube into the glottis instead of the esophagus. The growing trend of robotic-assisted TI would require a medical robot to distinguish anatomical features like an experienced physician which can be imitated by utilizing supervised deep-learning techniques. However, the real datasets of oropharyngeal organs are often inaccessible due to limited open-source data and patient privacy. In this work, we propose a domain adaptive Sim-to-Real framework called IoU-Ranking Blend-ArtFlow (IRB-AF) for image segmentation of oropharyngeal organs. The framework includes an image blending strategy called IoU-Ranking Blend (IRB) and style-transfer method ArtFlow. Here, IRB alleviates the problem of poor segmentation performance caused by significant datasets domain differences; while ArtFlow is introduced to reduce the discrepancies between datasets further. A virtual oropharynx image dataset generated by the SOFA framework is used as the learning subject for semantic segmentation to deal with the limited availability of actual endoscopic images. We adapted IRB-AF with the state-of-the-art domain adaptive segmentation models. The results demonstrate the superior performance of our approach in further improving the segmentation accuracy and training stability.
翻译:视频辅助经口气管插管需要使用内窥镜,帮助医生将气管导管插入声门而非食管。随着机器人辅助气管插管技术的日益普及,医疗机器人需要像经验丰富的医生一样识别解剖特征,这可通过监督式深度学习技术实现模仿。然而,由于开源数据有限及患者隐私问题,真实口咽器官数据集往往难以获取。本研究提出一种名为IoU排序混合-ArtFlow(IRB-AF)的领域自适应虚拟到真实框架,用于口咽器官图像分割。该框架包含名为IoU排序混合的图像混合策略与风格迁移方法ArtFlow。其中,IRB缓解了因数据集领域差异显著导致的分割性能不佳问题;ArtFlow则进一步缩小数据集间的差异。为解决真实内窥镜图像数据有限的问题,采用SOFA框架生成的虚拟口咽图像数据集作为语义分割的学习主体。我们将IRB-AF与当前最先进的领域自适应分割模型进行适配,实验结果表明,本方法在进一步提升分割精度与训练稳定性方面具有优越性能。