4D Monocular Surgical Reconstruction under Arbitrary Camera Motions

from arxiv, Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file Subjects

Reconstructing deformable surgical scenes from endoscopic videos is challenging and clinically important. Recent state-of-the-art methods based on implicit neural representations or 3D Gaussian splatting have made notable progress. However, most are designed for deformable scenes with fixed endoscope viewpoints and rely on stereo depth priors or accurate structure-from-motion for initialization and optimization, limiting their ability to handle monocular sequences with large camera motion in real clinical settings. To address this, we propose Local-EndoGS, a high-quality 4D reconstruction framework for monocular endoscopic sequences with arbitrary camera motion. Local-EndoGS introduces a progressive, window-based global representation that allocates local deformable scene models to each observed window, enabling scalability to long sequences with substantial motion. To overcome unreliable initialization without stereo depth or accurate structure-from-motion, we design a coarse-to-fine strategy integrating multi-view geometry, cross-window information, and monocular depth priors, providing a robust foundation for optimization. We further incorporate long-range 2D pixel trajectory constraints and physical motion priors to improve deformation plausibility. Experiments on three public endoscopic datasets with deformable scenes and varying camera motions show that Local-EndoGS consistently outperforms state-of-the-art methods in appearance quality and geometry. Ablation studies validate the effectiveness of our key designs. Code will be released upon acceptance at: https://github.com/IRMVLab/Local-EndoGS.

翻译：从内窥镜视频中重建形变手术场景具有挑战性且具有重要临床意义。基于隐式神经表示或三维高斯泼溅的最新先进方法已取得显著进展。然而，大多数方法针对固定内窥镜视点的形变场景设计，并依赖立体深度先验或精确的运动结构进行初始化和优化，限制了其在真实临床环境中处理具有大幅相机运动的单目序列的能力。为解决此问题，我们提出Local-EndoGS，一种针对具有任意相机运动的单目内窥镜序列的高质量四维重建框架。Local-EndoGS引入了一种渐进的、基于窗口的全局表示，为每个观测窗口分配局部形变场景模型，从而实现对具有大幅运动的长序列的可扩展性。为克服无立体深度或精确运动结构时不可靠的初始化问题，我们设计了一种融合多视图几何、跨窗口信息及单目深度先验的由粗到精策略，为优化提供了稳健的基础。我们进一步引入长程二维像素轨迹约束和物理运动先验，以提升形变合理性。在三个包含形变场景及不同相机运动的公开内窥镜数据集上的实验表明，Local-EndoGS在外观质量和几何精度方面均持续优于现有先进方法。消融研究验证了我们关键设计的有效性。代码将在论文录用后发布于：https://github.com/IRMVLab/Local-EndoGS。