Real-time reconstruction of deformable surgical scenes is vital for advancing robotic surgery, improving surgeon guidance, and enabling automation. Recent methods achieve dense reconstructions from da Vinci robotic surgery videos, with Gaussian Splatting (GS) offering real-time performance via graphics acceleration. However, reconstruction quality in occluded regions remains limited, and depth accuracy has not been fully assessed, as benchmarks like EndoNeRF and StereoMIS lack 3D ground truth. We propose Diff2DGS, a novel two-stage framework for reliable 3D reconstruction of occluded surgical scenes. In the first stage, a diffusion-based video module with temporal priors inpaints tissue occluded by instruments with high spatial-temporal consistency. In the second stage, we adapt 2D Gaussian Splatting (2DGS) with a Learnable Deformation Model (LDM) to capture dynamic tissue deformation and anatomical geometry. We also extend evaluation beyond prior image-quality metrics by performing quantitative depth accuracy analysis on the SCARED dataset. Diff2DGS outperforms state-of-the-art approaches in both appearance and geometry, reaching 38.02 dB PSNR on EndoNeRF and 34.40 dB on StereoMIS. Furthermore, our experiments demonstrate that optimizing for image quality alone does not necessarily translate into optimal 3D reconstruction accuracy. To address this, we further optimize the depth quality of the reconstructed 3D results, ensuring more faithful geometry in addition to high-fidelity appearance.
翻译:可变形手术场景的实时重建对于推进机器人手术、改善外科医生引导及实现自动化至关重要。现有方法能够从达芬奇机器人手术视频中实现密集重建,其中高斯泼溅(GS)通过图形加速提供了实时性能。然而,遮挡区域的重建质量仍然有限,且深度精度尚未得到充分评估,因为诸如EndoNeRF和StereoMIS等基准数据集缺乏三维真实数据。我们提出Diff2DGS,一种新颖的两阶段框架,用于实现遮挡手术场景的可靠三维重建。在第一阶段,一个具有时序先验的基于扩散的视频模块以高时空一致性修复被手术器械遮挡的组织。在第二阶段,我们采用带有可学习形变模型(LDM)的二维高斯泼溅(2DGS)来捕捉动态组织形变和解剖几何结构。我们还在SCARED数据集上进行定量深度精度分析,将评估范围扩展到先前的图像质量指标之外。Diff2DGS在表观和几何质量上均优于现有最先进方法,在EndoNeRF上达到38.02 dB PSNR,在StereoMIS上达到34.40 dB。此外,我们的实验表明,仅针对图像质量进行优化并不一定能转化为最优的三维重建精度。为此,我们进一步优化了重建三维结果的深度质量,在保证高保真表观的同时,确保更精确的几何结构。