We present a method for generating a full 360° orbit video around a person from a single input image. Existing methods typically adapt image-based diffusion models for multi-view synthesis, but yield inconsistent results across views and with the original identity. In contrast, recent video diffusion models have demonstrated their ability in generating photorealistic results that align well with the given prompts. Inspired by these results, we propose HumanOrbit, a video diffusion model for multi-view human image generation. Our approach enables the model to synthesize continuous camera rotations around the subject, producing geometrically consistent novel views while preserving the appearance and identity of the person. Using the generated multi-view frames, we further propose a reconstruction pipeline that recovers a textured mesh of the subject. Experimental results validate the effectiveness of HumanOrbit for multi-view image generation and that the reconstructed 3D models exhibit superior completeness and fidelity compared to those from state-of-the-art baselines.
翻译:本文提出一种从单张输入图像生成人物360°环绕视频的方法。现有方法通常将基于图像的扩散模型适配于多视图合成,但会导致不同视图间及与原始身份的不一致。相比之下,近期视频扩散模型已展现出生成与给定提示高度契合的逼真结果的能力。受此启发,我们提出HumanOrbit——一种用于多视图人体图像生成的视频扩散模型。该方法使模型能够合成围绕目标的连续相机旋转,在保持人物外观与身份的同时生成几何一致的新视图。利用生成的多视图帧,我们进一步提出重建流程以恢复目标的纹理网格。实验结果验证了HumanOrbit在多视图图像生成中的有效性,且重建的三维模型相较于现有先进基线方法展现出更优的完整度与保真度。