Reconstructing dynamic 3D scenes from monocular videos requires simultaneously capturing high-frequency appearance details and temporally continuous motion. Existing methods using single Gaussian primitives are limited by their low-pass filtering nature, while standard Gabor functions introduce energy instability. Moreover, lack of temporal continuity constraints often leads to motion artifacts during interpolation. We propose AdaGaR, a unified framework addressing both frequency adaptivity and temporal continuity in explicit dynamic scene modeling. We introduce Adaptive Gabor Representation, extending Gaussians through learnable frequency weights and adaptive energy compensation to balance detail capture and stability. For temporal continuity, we employ Cubic Hermite Splines with Temporal Curvature Regularization to ensure smooth motion evolution. An Adaptive Initialization mechanism combining depth estimation, point tracking, and foreground masks establishes stable point cloud distributions in early training. Experiments on Tap-Vid DAVIS demonstrate state-of-the-art performance (PSNR 35.49, SSIM 0.9433, LPIPS 0.0723) and strong generalization across frame interpolation, depth consistency, video editing, and stereo view synthesis. Project page: https://jiewenchan.github.io/AdaGaR/
翻译:从单目视频重建动态三维场景需要同时捕获高频外观细节与时间连续运动。现有基于单一高斯基元的方法受限于其低通滤波特性,而标准Gabor函数会引入能量不稳定问题。此外,时间连续性约束的缺失常导致插值过程中出现运动伪影。我们提出AdaGaR——一个在显式动态场景建模中同时解决频率自适应性与时间连续性的统一框架。我们引入自适应Gabor表示,通过可学习频率权重与自适应能量补偿扩展高斯函数,以平衡细节捕获与稳定性。针对时间连续性,我们采用带时间曲率正则化的三次Hermite样条来保证平滑的运动演化。结合深度估计、点跟踪与前景掩码的自适应初始化机制,在训练早期建立了稳定的点云分布。在Tap-Vid DAVIS数据集上的实验展示了最先进的性能(PSNR 35.49, SSIM 0.9433, LPIPS 0.0723),并在帧插值、深度一致性、视频编辑与立体视图合成等任务中表现出强大的泛化能力。项目页面:https://jiewenchan.github.io/AdaGaR/