Generalizing Surgical Instruments Segmentation to Unseen Domains with One-to-Many Synthesis

Despite their impressive performance in various surgical scene understanding tasks, deep learning-based methods are frequently hindered from deploying to real-world surgical applications for various causes. Particularly, data collection, annotation, and domain shift in-between sites and patients are the most common obstacles. In this work, we mitigate data-related issues by efficiently leveraging minimal source images to generate synthetic surgical instrument segmentation datasets and achieve outstanding generalization performance on unseen real domains. Specifically, in our framework, only one background tissue image and at most three images of each foreground instrument are taken as the seed images. These source images are extensively transformed and employed to build up the foreground and background image pools, from which randomly sampled tissue and instrument images are composed with multiple blending techniques to generate new surgical scene images. Besides, we introduce hybrid training-time augmentations to diversify the training data further. Extensive evaluation on three real-world datasets, i.e., Endo2017, Endo2018, and RoboTool, demonstrates that our one-to-many synthetic surgical instruments datasets generation and segmentation framework can achieve encouraging performance compared with training with real data. Notably, on the RoboTool dataset, where a more significant domain gap exists, our framework shows its superiority of generalization by a considerable margin. We expect that our inspiring results will attract research attention to improving model generalization with data synthesizing.

翻译：尽管深度学习方法在各类手术场景理解任务中展现出卓越性能，但受多种因素制约，其在实际手术应用中的部署常面临阻碍。其中，数据采集与标注困难、不同医疗中心及患者间的域偏移问题尤为突出。本研究通过高效利用最小化源图像生成合成手术器械分割数据集，有效缓解数据相关问题，并在未见真实域上实现卓越的泛化性能。具体而言，本框架仅需单张背景组织图像及每类前景器械至多三张图像作为种子图像。通过对这些源图像进行广泛变换构建前景与背景图像池，采用多种混合技术随机采样组织与器械图像生成新型手术场景图像。此外，我们引入混合训练时增强策略以进一步丰富训练数据多样性。在Endo2017、Endo2018和RoboTool三个真实数据集上的综合评估表明，与使用真实数据训练相比，本研究的单向多合成手术器械数据集生成与分割框架可取得令人鼓舞的性能。值得注意的是，在域差异更为显著的RoboTool数据集上，本框架展现出显著优越的泛化能力。期待本研究的启发性成果能吸引更多关注，推动通过数据合成技术提升模型泛化性能的研究。