Gastric simulators with objective educational feedback have been proven useful for endoscopy training. Existing electronic simulators with feedback are however not commonly adopted due to their high cost. In this work, a motion-guided dual-camera tracker is proposed to provide reliable endoscope tip position feedback at a low cost inside a mechanical simulator for endoscopy skill evaluation, tackling several unique challenges. To address the issue of significant appearance variation of the endoscope tip while keeping dual-camera tracking consistency, the cross-camera mutual template strategy (CMT) is proposed to introduce dynamic transient mutual templates to dual-camera tracking. To alleviate disturbance from large occlusion and distortion by the light source from the endoscope tip, the Mamba-based motion-guided prediction head (MMH) is presented to aggregate visual tracking with historical motion information modeled by the state space model. The proposed tracker was evaluated on datasets captured by low-cost camera pairs during endoscopy procedures performed inside the mechanical simulator. The tracker achieves SOTA performance with robust and consistent tracking on dual cameras. Further downstream evaluation proves that the 3D tip position determined by the proposed tracker enables reliable skill differentiation. The code and dataset will be released upon acceptance.
翻译:胃部模拟器配备客观教育反馈已被证明对内镜培训有效。然而,现有带反馈的电子模拟器因成本高昂而未被广泛采用。本文提出一种运动引导式双摄像头追踪器,在机械模拟器内以低成本提供可靠的内镜尖端位置反馈,用于内镜技能评估,并应对若干独特挑战。为解决内镜尖端外观显著变化同时保持双摄像头追踪一致性的问题,提出跨摄像头相互模板策略(CMT),将动态瞬态相互模板引入双摄像头追踪。为缓解光源对内镜尖端造成的严重遮挡和畸变干扰,提出基于Mamba的运动引导预测头(MMH),通过状态空间模型建模的历史运动信息聚合视觉追踪。该追踪器在机械模拟器内镜操作过程中由低成本摄像头对捕获的数据集上进行了评估。追踪器在双摄像头上实现了鲁棒且一致的SOTA性能。后续下游评估证明,由该追踪器确定的3D尖端位置能够实现可靠的技能区分。代码和数据集将在录用后发布。