Rendering scenes observed in a monocular video from novel viewpoints is a challenging problem. For static scenes the community has studied both scene-specific optimization techniques, which optimize on every test scene, and generalized techniques, which only run a deep net forward pass on a test scene. In contrast, for dynamic scenes, scene-specific optimization techniques exist, but, to our best knowledge, there is currently no generalized method for dynamic novel view synthesis from a given monocular video. To answer whether generalized dynamic novel view synthesis from monocular videos is possible today, we establish an analysis framework based on existing techniques and work toward the generalized approach. We find a pseudo-generalized process without scene-specific appearance optimization is possible, but geometrically and temporally consistent depth estimates are needed. Despite no scene-specific appearance optimization, the pseudo-generalized approach improves upon some scene-specific methods.
翻译:从单目视频中渲染新视角场景是一个具有挑战性的问题。对于静态场景,学术界已研究了场景特定优化技术(在每个测试场景上进行优化)和广义技术(仅对测试场景执行深度网络前向传播)。相比之下,对于动态场景,虽然存在场景特定优化技术,但据我们所知,目前尚无针对给定单目视频进行广义动态新视角合成的通用方法。为回答从单目视频进行广义动态新视角合成今日是否可行,我们基于现有技术建立分析框架,并致力于探索广义方法。我们发现存在无需场景特定外观优化的伪广义流程,但需要几何一致且时间一致的深度估计。尽管未进行场景特定外观优化,伪广义方法仍优于部分场景特定方法。