Benchmarking Robustness of Endoscopic Depth Estimation with Synthetically Corrupted Data

Accurate depth perception is crucial for patient outcomes in endoscopic surgery, yet it is compromised by image distortions common in surgical settings. To tackle this issue, our study presents a benchmark for assessing the robustness of endoscopic depth estimation models. We have compiled a comprehensive dataset that reflects real-world conditions, incorporating a range of synthetically induced corruptions at varying severity levels. To further this effort, we introduce the Depth Estimation Robustness Score (DERS), a novel metric that combines measures of error, accuracy, and robustness to meet the multifaceted requirements of surgical applications. This metric acts as a foundational element for evaluating performance, establishing a new paradigm for the comparative analysis of depth estimation technologies. Additionally, we set forth a benchmark focused on robustness for the evaluation of depth estimation in endoscopic surgery, with the aim of driving progress in model refinement. A thorough analysis of two monocular depth estimation models using our framework reveals crucial information about their reliability under adverse conditions. Our results emphasize the essential need for algorithms that can tolerate data corruption, thereby advancing discussions on improving model robustness. The impact of this research transcends theoretical frameworks, providing concrete gains in surgical precision and patient safety. This study establishes a benchmark for the robustness of depth estimation and serves as a foundation for developing more resilient surgical support technologies. Code is available at https://github.com/lofrienger/EndoDepthBenchmark.

翻译：精确的深度感知对内窥镜手术的患者预后至关重要，然而手术环境中常见的图像失真会严重影响其准确性。为解决这一问题，本研究提出了一个评估内窥镜深度估计模型鲁棒性的基准。我们构建了一个全面反映真实手术条件的数据集，其中包含多种类型、不同严重程度的合成诱导损坏数据。为进一步推进该领域研究，我们提出了深度估计鲁棒性评分（DERS）这一新颖指标，该指标综合了误差度量、精度度量和鲁棒性度量，以满足手术应用的多方面需求。该指标作为性能评估的基础要素，为深度估计技术的比较分析建立了新范式。此外，我们建立了一个专注于鲁棒性的内窥镜手术深度估计评估基准，旨在推动模型优化的进展。使用我们的框架对两种单目深度估计模型进行的全面分析，揭示了它们在不利条件下的可靠性关键信息。我们的结果强调了算法必须具备容忍数据损坏能力的必要性，从而推动了关于提升模型鲁棒性的讨论。本研究的影响超越了理论框架，为提升手术精度和患者安全提供了实质性贡献。本研究建立了深度估计鲁棒性的基准，并为开发更具韧性的手术辅助技术奠定了基础。代码发布于 https://github.com/lofrienger/EndoDepthBenchmark。