We present a first step towards 4D (3D and time) human video stylization, which addresses style transfer, novel view synthesis and human animation within a unified framework. While numerous video stylization methods have been developed, they are often restricted to rendering images in specific viewpoints of the input video, lacking the capability to generalize to novel views and novel poses in dynamic scenes. To overcome these limitations, we leverage Neural Radiance Fields (NeRFs) to represent videos, conducting stylization in the rendered feature space. Our innovative approach involves the simultaneous representation of both the human subject and the surrounding scene using two NeRFs. This dual representation facilitates the animation of human subjects across various poses and novel viewpoints. Specifically, we introduce a novel geometry-guided tri-plane representation, significantly enhancing feature representation robustness compared to direct tri-plane optimization. Following the video reconstruction, stylization is performed within the NeRFs' rendered feature space. Extensive experiments demonstrate that the proposed method strikes a superior balance between stylized textures and temporal coherence, surpassing existing approaches. Furthermore, our framework uniquely extends its capabilities to accommodate novel poses and viewpoints, making it a versatile tool for creative human video stylization.
翻译:我们提出了迈向4D(三维与时间)人体视频风格化的第一步,该方法在统一框架内实现了风格迁移、新视角合成与人体动画。尽管已有多种视频风格化方法,但它们通常局限于在输入视频的特定视角下渲染图像,缺乏对动态场景中新视角和新姿态的泛化能力。为突破这些限制,我们利用神经辐射场(NeRFs)表示视频,并在渲染的特征空间中执行风格化。我们的创新方法采用两个NeRFs同时表征人体对象与周围场景,这种双重表征有利于实现人体对象在不同姿态和新视角下的动画化。具体而言,我们引入了一种新颖的几何引导三平面表征,相较于直接的三平面优化,显著增强了特征表征的鲁棒性。完成视频重建后,在NeRFs渲染的特征空间中进行风格化处理。大量实验表明,所提方法在风格化纹理与时间连贯性之间取得了更优的平衡,超越了现有方法。此外,我们的框架独特地扩展了其能力以适配新姿态与新视角,使其成为创意人体视频风格化的通用工具。