Purpose: Natural orifice surgeries minimize the need for incisions and reduce the recovery time compared to open surgery; however, they require a higher level of expertise due to visualization and orientation challenges. We propose a perception pipeline for these surgeries that allows semantic scene understanding. Methods: We bring learning-based segmentation, depth estimation, and 3D reconstruction modules together to create real-time segmented maps of the surgical scenes. Additionally, we use registration with robot poses to solve the scale ambiguity of mapping from monocular images, and allow the use of semantically informed real-time reconstructions in robotic surgeries. Results: We achieve sub-milimeter reconstruction accuracy based on average one-sided Chamfer distances, average pose registration RMSE of 0.9 mm, and an estimated scale within 2% of ground truth. Conclusion: We present a modular perception pipeline, integrating semantic segmentation with real-time monocular SLAM for natural orifice surgeries. This pipeline offers a promising solution for scene understanding that can facilitate automation or surgeon guidance.
翻译:目的:与开放手术相比,自然腔道手术能够最大限度地减少切口需求并缩短恢复时间;然而,由于可视化与空间定向的挑战,此类手术对操作者的专业水平提出了更高要求。我们提出了一种适用于此类手术的感知流程,以实现对手术场景的语义理解。方法:我们将基于学习的图像分割、深度估计与三维重建模块相结合,以生成手术场景的实时语义分割地图。此外,我们利用机器人位姿配准来解决单目图像建图过程中的尺度模糊性问题,并使得语义信息化的实时重建结果能够应用于机器人辅助手术。结果:基于平均单向倒角距离,我们实现了亚毫米级的重建精度,平均位姿配准均方根误差为0.9毫米,且估计尺度与真实尺度的误差在2%以内。结论:我们提出了一种模块化的感知流程,将语义分割与实时单目SLAM相结合,应用于自然腔道手术。该流程为场景理解提供了一种有前景的解决方案,有望促进手术自动化或为外科医生提供操作引导。