Realistic Data Generation for 6D Pose Estimation of Surgical Instruments

Automation in surgical robotics has the potential to improve patient safety and surgical efficiency, but it is difficult to achieve due to the need for robust perception algorithms. In particular, 6D pose estimation of surgical instruments is critical to enable the automatic execution of surgical maneuvers based on visual feedback. In recent years, supervised deep learning algorithms have shown increasingly better performance at 6D pose estimation tasks; yet, their success depends on the availability of large amounts of annotated data. In household and industrial settings, synthetic data, generated with 3D computer graphics software, has been shown as an alternative to minimize annotation costs of 6D pose datasets. However, this strategy does not translate well to surgical domains as commercial graphics software have limited tools to generate images depicting realistic instrument-tissue interactions. To address these limitations, we propose an improved simulation environment for surgical robotics that enables the automatic generation of large and diverse datasets for 6D pose estimation of surgical instruments. Among the improvements, we developed an automated data generation pipeline and an improved surgical scene. To show the applicability of our system, we generated a dataset of 7.5k images with pose annotations of a surgical needle that was used to evaluate a state-of-the-art pose estimation network. The trained model obtained a mean translational error of 2.59mm on a challenging dataset that presented varying levels of occlusion. These results highlight our pipeline's success in training and evaluating novel vision algorithms for surgical robotics applications.

翻译：手术机器人领域的自动化有望提升患者安全与手术效率，但由于需要鲁棒的感知算法，其实现面临挑战。其中，手术器械的六维位姿估计对于基于视觉反馈自动执行手术操作至关重要。近年来，监督式深度学习算法在六维位姿估计任务中展现出持续提升的性能；然而，其成功依赖于大量标注数据的可用性。在家庭和工业场景中，利用三维计算机图形软件生成的合成数据已被证明可降低六维位姿数据集标注成本。但该策略难以直接迁移至手术领域，因为商业图形软件在生成真实器械-组织交互图像方面工具受限。针对这些局限性，我们提出一种改进的手术机器人仿真环境，可自动生成大规模、多样化的手术器械六维位姿估计数据集。其中，我们开发了自动化数据生成流水线并构建了改进的手术场景。为验证系统的适用性，我们生成了包含7.5千张手术针位姿标注图像的数据集，并用于评估当前最先进的位姿估计网络。在呈现不同遮挡程度的挑战性数据集上，训练模型获得了平均平移误差2.59毫米的结果。这些结果凸显了本流水线在训练和评估手术机器人应用新型视觉算法方面的有效性。