Continuum manipulators in flexible endoscopic surgical systems offer high dexterity for minimally invasive procedures; however, accurate pose estimation and closed-loop control remain challenging due to hysteresis, compliance, and limited distal sensing. Vision-based approaches reduce hardware complexity but are often constrained by limited geometric observability and high computational overhead, restricting real-time closed-loop applicability. This paper presents a unified framework for markerless stereo 6D pose estimation and position-based visual servoing of continuum manipulators. A photo-realistic simulation pipeline enables large-scale automatic training with pixel-accurate annotations. A stereo-aware multi-feature fusion network jointly exploits segmentation masks, keypoints, heatmaps, and bounding boxes to enhance geometric observability. To enforce geometric consistency without iterative optimization, a feed-forward rendering-based refinement module predicts residual pose corrections in a single pass. A self-supervised sim-to-real adaptation strategy further improves real-world performance using unlabeled data. Extensive real-world validation achieves a mean translation error of 0.83 mm and a mean rotation error of 2.76° across 1,000 samples. Markerless closed-loop visual servoing driven by the estimated pose attains accurate trajectory tracking with a mean translation error of 2.07 mm and a mean rotation error of 7.41°, corresponding to 85% and 59% reductions compared to open-loop control, together with high repeatability in repeated point-reaching tasks. To the best of our knowledge, this work presents the first fully markerless pose-estimation-driven position-based visual servoing framework for continuum manipulators, enabling precise closed-loop control without physical markers or embedded sensing.
翻译:柔性内窥镜手术系统中的连续体机械臂为微创手术提供了高灵巧性;然而,由于迟滞、柔顺性和末端传感能力有限,精确的位姿估计与闭环控制仍具挑战性。基于视觉的方法降低了硬件复杂度,但常受限于几何可观测性不足和高计算开销,制约了实时闭环应用。本文提出了一种用于连续体机械臂的无标记双目6D位姿估计与基于位置的视觉伺服统一框架。一个逼真的仿真流程支持大规模自动训练,并提供像素级精确标注。一个双目感知的多特征融合网络联合利用分割掩码、关键点、热力图和边界框以增强几何可观测性。为在不进行迭代优化的前提下保证几何一致性,一个基于前馈渲染的细化模块以单次前向传播预测位姿残差修正。一种自监督的仿真到真实适应策略进一步利用未标注数据提升了真实世界性能。大量真实世界验证在1000个样本上实现了0.83毫米的平均平移误差和2.76°的平均旋转误差。由估计位姿驱动的无标记闭环视觉伺服实现了精确的轨迹跟踪,平均平移误差为2.07毫米,平均旋转误差为7.41°,相较于开环控制分别降低了85%和59%,并在重复点到达任务中表现出高重复性。据我们所知,本研究首次提出了一个完全无标记的、由位姿估计驱动的、基于位置的连续体机械臂视觉伺服框架,实现了无需物理标记或嵌入式传感的精确闭环控制。