In the realm of robot-assisted minimally invasive surgery, dynamic scene reconstruction can significantly enhance downstream tasks and improve surgical outcomes. Neural Radiance Fields (NeRF)-based methods have recently risen to prominence for their exceptional ability to reconstruct scenes. Nonetheless, these methods are hampered by slow inference, prolonged training, and substantial computational demands. Additionally, some rely on stereo depth estimation, which is often infeasible due to the high costs and logistical challenges associated with stereo cameras. Moreover, the monocular reconstruction quality for deformable scenes is currently inadequate. To overcome these obstacles, we present Endo-4DGS, an innovative, real-time endoscopic dynamic reconstruction approach that utilizes 4D Gaussian Splatting (GS) and requires no ground truth depth data. This method extends 3D GS by incorporating a temporal component and leverages a lightweight MLP to capture temporal Gaussian deformations. This effectively facilitates the reconstruction of dynamic surgical scenes with variable conditions. We also integrate Depth-Anything to generate pseudo-depth maps from monocular views, enhancing the depth-guided reconstruction process. Our approach has been validated on two surgical datasets, where it has proven to render in real-time, compute efficiently, and reconstruct with remarkable accuracy. These results underline the vast potential of Endo-4DGS to improve surgical assistance.
翻译:在机器人辅助微创手术领域,动态场景重建可显著提升下游任务性能并改善手术效果。基于神经辐射场(NeRF)的方法因其卓越的场景重建能力近年备受关注。然而,这类方法存在推理速度慢、训练周期长及计算资源需求高等缺陷。部分方法依赖立体深度估计,但立体相机的高昂成本与部署困难常使其难以实现。此外,当前可变形场景的单目重建质量仍不理想。为突破上述瓶颈,我们提出Endo-4DGS——一种创新的实时内窥镜动态重建方法。该方法采用4D高斯泼溅(GS)技术,无需真实深度数据即可实现重建。通过引入时间分量扩展3D GS,并借助轻量级MLP捕获时间维度的高斯变形,该方法有效实现了具有可变条件的动态手术场景重建。我们同时集成Depth-Anything技术,从单目视图中生成伪深度图以增强深度引导重建过程。在两个手术数据集上的验证表明,本方法具备实时渲染、高效计算与精准重建能力。这些结果凸显了Endo-4DGS在提升手术辅助性能方面的巨大潜力。