Surgical instrument segmentation in laparoscopy is essential for computer-assisted surgical systems. Despite the Deep Learning progress in recent years, the dynamic setting of laparoscopic surgery still presents challenges for precise segmentation. The nnU-Net framework excelled in semantic segmentation analyzing single frames without temporal information. The framework's ease of use, including its ability to be automatically configured, and its low expertise requirements, have made it a popular base framework for comparisons. Optical flow (OF) is a tool commonly used in video tasks to estimate motion and represent it in a single frame, containing temporal information. This work seeks to employ OF maps as an additional input to the nnU-Net architecture to improve its performance in the surgical instrument segmentation task, taking advantage of the fact that instruments are the main moving objects in the surgical field. With this new input, the temporal component would be indirectly added without modifying the architecture. Using CholecSeg8k dataset, three different representations of movement were estimated and used as new inputs, comparing them with a baseline model. Results showed that the use of OF maps improves the detection of classes with high movement, even when these are scarce in the dataset. To further improve performance, future work may focus on implementing other OF-preserving augmentations.
翻译:腹腔镜手术中的手术器械分割对于计算机辅助手术系统至关重要。尽管近年来深度学习取得了进展,但腹腔镜手术的动态环境仍对精确分割提出了挑战。nnU-Net框架在分析不含时间信息的单帧语义分割中表现优异。该框架易用性强,具备自动配置能力且对专业知识要求低,已成为广泛使用的基准框架。光流是视频任务中常用的运动估计工具,可将运动信息编码至单帧图像中。本研究旨在利用手术器械是手术视野中主要运动物体的特点,将光流图作为nnU-Net架构的额外输入,以提升其在手术器械分割任务中的性能。通过引入这种新输入,时间信息可间接融入现有架构而无需修改模型结构。基于CholecSeg8k数据集,我们估计了三种不同的运动表征作为新输入,并与基线模型进行对比。结果表明,即使数据集中运动剧烈的类别样本稀少,使用光流图仍能改善其检测效果。未来工作可进一步探索保留光流特性的数据增强方法以提升性能。