Omnidirectional cameras are extensively used in various applications to provide a wide field of vision. However, they face a challenge in synthesizing novel views due to the inevitable presence of dynamic objects, including the photographer, in their wide field of view. In this paper, we introduce a new approach called Omnidirectional Local Radiance Fields (OmniLocalRF) that can render static-only scene views, removing and inpainting dynamic objects simultaneously. Our approach combines the principles of local radiance fields with the bidirectional optimization of omnidirectional rays. Our input is an omnidirectional video, and we evaluate the mutual observations of the entire angle between the previous and current frames. To reduce ghosting artifacts of dynamic objects and inpaint occlusions, we devise a multi-resolution motion mask prediction module. Unlike existing methods that primarily separate dynamic components through the temporal domain, our method uses multi-resolution neural feature planes for precise segmentation, which is more suitable for long 360-degree videos. Our experiments validate that OmniLocalRF outperforms existing methods in both qualitative and quantitative metrics, especially in scenarios with complex real-world scenes. In particular, our approach eliminates the need for manual interaction, such as drawing motion masks by hand and additional pose estimation, making it a highly effective and efficient solution.
翻译:全方位相机广泛用于各类应用场景以提供宽广视野,但其广阔视野中不可避免地存在包括拍摄者在内的动态物体,给新视角合成带来了挑战。本文提出一种名为全方位局部辐射场(OmniLocalRF)的新型方法,可同时移除并修补动态物体,仅渲染静态场景视图。该方法融合了局部辐射场原理与全方位射线的双向优化策略。输入为全方位视频,我们评估当前帧与前序帧在全角度范围内的相互观测值。为减少动态物体的重影伪影并修补遮挡区域,我们设计了一种多分辨率运动掩膜预测模块。与现有主要通过时域分离动态分量的方法不同,本方法采用多分辨率神经特征平面实现精准分割,更适合处理长时360度视频。实验验证表明,OmniLocalRF在定性及定量指标上均超越现有方法,尤其适用于复杂的真实世界场景。值得注意的是,本方法无需手动交互(如手动绘制运动掩膜)及额外位姿估计,展现出极高的有效性和效率。