Autonomous navigation is crucial for both medical and industrial endoscopic robots, enabling safe and efficient exploration of narrow tubular environments without continuous human intervention, where avoiding contact with the inner walls has been a longstanding challenge for prior approaches. We present a follow-the-leader endoscopic robot based on a flexible continuum structure designed to minimize contact between the endoscope body and intestinal walls, thereby reducing patient discomfort. To achieve this objective, we propose a vision-based deep reinforcement learning framework guided by monocular depth estimation. A realistic intestinal simulation environment was constructed in \textit{NVIDIA Omniverse} to train and evaluate autonomous navigation strategies. Furthermore, thousands of synthetic intraluminal images were generated using NVIDIA Replicator to fine-tune the Depth Anything model, enabling dense three-dimensional perception of the intestinal environment with a single monocular camera. Subsequently, we introduce a geometry-aware reward and penalty mechanism to enable accurate lumen tracking. Compared with the original Depth Anything model, our method improves $δ_{1}$ depth accuracy by 39.2% and reduces the navigation J-index by 0.67 relative to the second-best method, demonstrating the robustness and effectiveness of the proposed approach.
翻译:自主导航对于医疗和工业内窥镜机器人至关重要,它使得机器人能够在无需持续人工干预的情况下安全高效地探索狭窄管状环境,而避免与内壁接触一直是先前方法长期面临的挑战。我们提出了一种基于柔性连续体结构的跟随式内窥镜机器人,旨在最小化内窥镜主体与肠道壁之间的接触,从而减轻患者不适。为实现这一目标,我们提出了一种基于视觉的深度强化学习框架,该框架以单目深度估计为指导。我们在 \textit{NVIDIA Omniverse} 中构建了一个逼真的肠道模拟环境,用于训练和评估自主导航策略。此外,利用 NVIDIA Replicator 生成了数千张合成腔内图像,对 Depth Anything 模型进行微调,从而实现了使用单个单目相机对肠道环境进行密集的三维感知。随后,我们引入了一种几何感知的奖励与惩罚机制,以实现精确的管腔跟踪。与原始 Depth Anything 模型相比,我们的方法将 $δ_{1}$ 深度精度提高了 39.2%,并且相对于次优方法将导航 J 指数降低了 0.67,证明了所提方法的鲁棒性和有效性。