Continuum robots are promising candidates for interactive tasks in medical and industrial applications due to their unique shape, compliance, and miniaturization capability. Accurate and real-time shape sensing is essential for such tasks yet remains a challenge. Embedded shape sensing has high hardware complexity and cost, while vision-based methods require stereo setup and struggle to achieve real-time performance. This paper proposes the first eye-to-hand monocular approach to continuum robot shape sensing. Utilizing a deep encoder-decoder network, our method, MoSSNet, eliminates the computation cost of stereo matching and reduces requirements on sensing hardware. In particular, MoSSNet comprises an encoder and three parallel decoders to uncover spatial, length, and contour information from a single RGB image, and then obtains the 3D shape through curve fitting. A two-segment tendon-driven continuum robot is used for data collection and testing, demonstrating accurate (mean shape error of 0.91 mm, or 0.36% of robot length) and real-time (70 fps) shape sensing on real-world data. Additionally, the method is optimized end-to-end and does not require fiducial markers, manual segmentation, or camera calibration. Code and datasets will be made available at https://github.com/ContinuumRoboticsLab/MoSSNet.
翻译:连续型机器人因其独特的形状、柔顺性和微型化能力,在医疗和工业领域的交互任务中具有广阔应用前景。精确且实时的形状感知对此类任务至关重要,但仍是技术挑战。嵌入式形状感知存在硬件复杂度和成本高的问题,而基于视觉的方法需依赖立体配置且难以实现实时性能。本文提出首个面向连续型机器人的眼-手单目形状感知方法。通过深度编码器-解码器网络MoSSNet,我们消除了立体匹配的计算开销并降低了对传感硬件的需求。具体而言,MoSSNet由编码器和三个并行解码器构成,可从单张RGB图像中提取空间、长度和轮廓信息,进而通过曲线拟合获得三维形状。采用双段腱驱动连续型机器人进行数据采集与测试,实验表明该方法在真实数据上实现了高精度(平均形状误差0.91 mm,占机器人长度0.36%)与实时性能(70 fps)。此外,该方法为端到端优化,无需标记点、人工分割或相机标定。代码与数据集将开源至https://github.com/ContinuumRoboticsLab/MoSSNet。