In the evolving landscape of digital media and video production, the precise manipulation and reproduction of visual elements like camera movements and character actions are highly desired. Existing SLAM methods face limitations in dynamic scenes and human pose estimation often focuses on 2D projections, neglecting 3D statuses. To address these issues, we first introduce a reverse filming behavior estimation technique. It optimizes camera trajectories by leveraging NeRF as a differentiable renderer and refining SMPL tracks. We then introduce a cinematic transfer pipeline that is able to transfer various shot types to a new 2D video or a 3D virtual environment. The incorporation of 3D engine workflow enables superior rendering and control abilities, which also achieves a higher rating in the user study.
翻译:在数字媒体与视频制作不断发展的背景下,对摄像机运动与角色动作等视觉元素的精确操控与复现需求日益迫切。现有SLAM方法在动态场景中存在局限性,而人体姿态估计多聚焦于二维投影,忽视三维状态。为解决上述问题,我们首先提出一种反向拍摄行为估计技术——通过利用NeRF作为可微分渲染器并优化SMPL轨迹,实现摄像机运动轨迹的优化。随后引入电影化迁移流水线,该流水线可将多种镜头类型迁移至新的二维视频或三维虚拟环境。三维引擎工作流的集成不仅赋予我们卓越的渲染与控制能力,更在用户研究中获得了更高评分。