Accurate Monocular Depth Estimation (MDE) is critical for autonomous robotic surgery. However, existing self-supervised methods often exhibit a severe "ex-vivo to in-vivo gap": they achieve high accuracy on public datasets but struggle in actual clinical deployments. This disparity arises because the severe specular reflections and fluid-filled deformations inherent to real surgeries. Models trained on noisy real-world pseudo-labels consequently suffer from severe boundary collapse. To address this, we leverage the high-fidelity synthetic priors of the \textit{Depth Anything V2} architecture, which inherently capture precise geometric details, and efficiently adapt them to the medical domain using Dynamic Vector Low-Rank Adaptation (DV-LORA). Our contributions are two-fold. Technically, our approach establishes a new state-of-the-art on the public SCARED dataset; under a novel physically-stratified evaluation protocol, it reduces Squared Relative Error by over 17\% in high-specularity regimes compared to strong baselines. Furthermore, to provide a rigorous reality check for the field, we introduce \textbf{ROCAL-T 90} (Real Operative CT-Aligned Laparoscopic Trajectories 90), the first real-surgery validation dataset featuring 90 clinical endoscopic sequences with sub-millimeter ($< 1$mm) ground-truth trajectories. Evaluations on ROCAL-T 90 demonstrate our model's superior robustness in true clinical settings.
翻译:精准的单目深度估计(MDE)对于自主机器人手术至关重要。然而,现有自监督方法常表现出严重的“体外-体内鸿沟”:它们在公开数据集上取得高精度,但在实际临床部署中却难以胜任。这一差异源于真实手术中固有的严重镜面反射和液填充变形。在嘈杂的真实世界伪标签上训练的模型因此遭受严重的边界坍塌。为解决此问题,我们利用《Depth Anything V2》架构的高保真合成先验(其天然捕获精确几何细节),并通过动态向量低秩自适应(DV-LORA)高效地将其适配至医学领域。我们的贡献有两方面:技术上,我们的方法在公开SCARED数据集上确立了新的最优性能;在新型物理分层评估协议下,与强基线相比,在高镜面反射区域将平方相对误差降低超过17%。此外,为给该领域提供严谨的现实检验,我们引入**ROCAL-T 90**(真实手术CT对齐腹腔镜轨迹90度),这是首个包含90个临床内窥镜序列且附带亚毫米级(<1mm)真实轨迹的真实手术验证数据集。在ROCAL-T 90上的评估表明,我们的模型在真实临床环境中具有卓越鲁棒性。