Camera virtualization -- an emerging solution to novel view synthesis -- holds transformative potential for visual entertainment, live performances, and sports broadcasting by enabling the generation of photorealistic images from novel viewpoints using images from a limited set of calibrated multiple static physical cameras. Despite recent advances, achieving spatially and temporally coherent and photorealistic rendering of dynamic scenes with efficient time-archival capabilities, particularly in fast-paced sports and stage performances, remains challenging for existing approaches. Recent methods based on 3D Gaussian Splatting (3DGS) for dynamic scenes could offer real-time view-synthesis results. Yet, they are hindered by their dependence on accurate 3D point clouds from the structure-from-motion method and their inability to handle large, non-rigid, rapid motions of different subjects (e.g., flips, jumps, articulations, sudden player-to-player transitions). Moreover, independent motions of multiple subjects can break the Gaussian-tracking assumptions commonly used in 4DGS, ST-GS, and other dynamic splatting variants. This paper advocates reconsidering a neural volume rendering formulation for camera virtualization and efficient time-archival capabilities, making it useful for sports broadcasting and related applications. By modeling a dynamic scene as rigid transformations across multiple synchronized camera views at a given time, our method performs neural representation learning, providing enhanced visual rendering quality at test time. A key contribution of our approach is its support for time-archival, i.e., users can revisit any past temporal instance of a dynamic scene and can perform novel view synthesis, enabling retrospective rendering for replay, analysis, and archival of live events, a functionality absent in existing neural rendering approaches and novel view synthesis...
翻译:相机虚拟化——一种新兴的新视角合成解决方案——通过利用有限数量标定静态物理相机拍摄的图像生成新视点的逼真图像,为视觉娱乐、现场表演和体育广播带来了变革性潜力。尽管近期取得了进展,但现有方法在实现动态场景时空一致且逼真的渲染,并具备高效时间归档能力方面——尤其是在快节奏体育赛事和舞台表演中——仍面临挑战。基于动态场景三维高斯泼溅(3DGS)的最新方法虽能提供实时视角合成结果,但其依赖运动恢复结构方法生成的精确三维点云,且无法处理不同主体的大范围非刚性快速运动(如空翻、跳跃、关节运动、运动员间的突然转换)。此外,多个主体的独立运动会破坏4DGS、ST-GS及其他动态泼溅变体中常用的高斯跟踪假设。本文主张重新考虑采用神经体渲染框架实现相机虚拟化与高效时间归档功能,使其适用于体育广播及相关应用。通过将动态场景建模为特定时刻多视角同步相机间的刚性变换,我们的方法执行神经表示学习,在测试时提供增强的视觉渲染质量。本方法的核心贡献在于支持时间归档功能,即用户可回溯动态场景的任意历史时刻并进行新视角合成,实现对直播事件的回放、分析与归档的追溯式渲染——这一功能是现有神经渲染方法及新视角合成技术所缺失的。