Common computer vision systems typically assume ideal pinhole cameras but fail when facing real-world camera effects such as fisheye distortion and rolling shutter, mainly due to the lack of learning from training data with camera effects. Existing data generation approaches suffer from either high costs, sim-to-real gaps or fail to accurately model camera effects. To address this bottleneck, we propose 4D Gaussian Ray Tracing (4D-GRT), a novel two-stage pipeline that combines 4D Gaussian Splatting with physically-based ray tracing for camera effect simulation. Given multi-view videos, 4D-GRT first reconstructs dynamic scenes, then applies ray tracing to generate videos with controllable, physically accurate camera effects. 4D-GRT achieves the fastest rendering speed while performing better or comparable rendering quality compared to existing baselines. Additionally, we construct eight synthetic dynamic scenes in indoor environments across four camera effects as a benchmark to evaluate generated videos with camera effects.
翻译:常见的计算机视觉系统通常假设理想针孔相机模型,但在面对真实世界的相机效应(如鱼眼畸变和卷帘快门)时会失效,这主要是由于缺乏从包含相机效应的训练数据中学习。现有的数据生成方法要么成本高昂,要么存在仿真到现实的差距,或者无法准确建模相机效应。为了解决这一瓶颈,我们提出了4D高斯光线追踪(4D-GRT),这是一种新颖的两阶段流程,它将4D高斯泼溅与基于物理的光线追踪相结合,用于相机效应模拟。给定多视角视频,4D-GRT首先重建动态场景,然后应用光线追踪生成具有可控、物理精确相机效应的视频。与现有基线方法相比,4D-GRT在实现最快渲染速度的同时,渲染质量表现更优或相当。此外,我们在室内环境中构建了八个涵盖四种相机效应的合成动态场景作为基准,用于评估生成的包含相机效应的视频。