Robot-assisted surgery has made great progress with the development of medical imaging and robotics technology. Medical scene understanding can greatly improve surgical performance while the semantic segmentation of the robotic instrument is a key enabling technology for robot-assisted surgery. However, how to locate an instrument's position and estimate their pose in complex surgical environments is still a challenging fundamental problem. In this paper, pixel-wise instrument segmentation is investigated. The contributions of the paper are twofold: 1) We proposed a two-level nested U-structure model, which is an encoder-decoder architecture with skip-connections and each layer of the network structure adopts a U-structure instead of a simple superposition of convolutional layers. The model can capture more context information from multiple scales and better fuse the local and global information to achieve high-quality segmentation. 2) Experiments have been conducted to qualitatively and quantitatively show the performance of our approach on three segmentation tasks: the binary segmentation, the parts segmentation, and the type segmentation, respectively.
翻译:机器人辅助手术随着医学成像与机器人技术的发展取得了巨大进步。医学场景理解能显著提升手术性能,而机器人器械的语义分割是机器人辅助手术的关键使能技术。然而,在复杂手术环境中定位器械位置并估计其姿态仍是一个具有挑战性的基础问题。本文研究了像素级器械分割技术。本文贡献包含两点:1) 我们提出了一种双层嵌套U型结构模型,该模型采用带有跳跃连接的编码器-解码器架构,且网络结构的每一层均采用U型结构而非简单的卷积层叠加。该模型能从多尺度捕获更多上下文信息,并更好地融合局部与全局信息以实现高质量分割。2) 我们分别针对二值分割、部件分割和类型分割三种分割任务进行了定性定量实验,验证了该方法的表现性能。