In the realm of robot-assisted minimally invasive surgery, dynamic scene reconstruction can significantly enhance downstream tasks and improve surgical outcomes. Neural Radiance Fields (NeRF)-based methods have recently risen to prominence for their exceptional ability to reconstruct scenes. Nonetheless, these methods are hampered by slow inference, prolonged training, and substantial computational demands. Additionally, some rely on stereo depth estimation, which is often infeasible due to the high costs and logistical challenges associated with stereo cameras. Moreover, the monocular reconstruction quality for deformable scenes is currently inadequate. To overcome these obstacles, we present Endo-4DGS, an innovative, real-time endoscopic dynamic reconstruction approach that utilizes 4D Gaussian Splatting (GS) and requires no ground truth depth data. This method extends 3D GS by incorporating a temporal component and leverages a lightweight MLP to capture temporal Gaussian deformations. This effectively facilitates the reconstruction of dynamic surgical scenes with variable conditions. We also integrate Depth-Anything to generate pseudo-depth maps from monocular views, enhancing the depth-guided reconstruction process. Our approach has been validated on two surgical datasets, where it can effectively render in real-time, compute efficiently, and reconstruct with remarkable accuracy. These results underline the vast potential of Endo-4DGS to improve surgical assistance.
翻译:在机器人辅助微创手术领域,动态场景重建可显著提升下游任务性能并改善手术效果。基于神经辐射场(NeRF)的方法因其卓越的场景重建能力而备受关注,但这类方法受限于推理速度慢、训练周期长及计算资源需求大等问题。此外,部分方法依赖立体深度估计,而立体相机的高昂成本与后勤挑战往往导致该方案不可行。更关键的是,当前针对可变形场景的单目重建质量仍显不足。为克服上述障碍,我们提出Endo-4DGS这一创新性实时内窥镜动态重建方法,该方法采用4D高斯泼溅(GS)技术且无需真实深度数据。该方案通过引入时间维分量扩展3D GS框架,并利用轻量级MLP捕捉时序高斯形变,从而有效实现可变条件下的动态手术场景重建。同时,我们集成Depth-Anything模型从单目视图生成伪深度图,增强深度引导重建过程。该方法已在两个手术数据集上完成验证,展现出实时渲染能力、计算高效性及高精度重建特性,充分彰显了Endo-4DGS在提升手术辅助效能方面的巨大潜力。