The demand for realistic and versatile character animation has surged, driven by its wide-ranging applications in various domains. However, the animation generation algorithms modeling human pose with 2D or 3D structures all face various problems, including low-quality output content and training data deficiency, preventing the related algorithms from generating high-quality animation videos. Therefore, we introduce MVAnimate, a novel framework that synthesizes both 2D and 3D information of dynamic figures based on multi-view prior information, to enhance the generated video quality. Our approach leverages multi-view prior information to produce temporally consistent and spatially coherent animation outputs, demonstrating improvements over existing animation methods. Our MVAnimate also optimizes the multi-view videos of the target character, enhancing the video quality from different views. Experimental results on diverse datasets highlight the robustness of our method in handling various motion patterns and appearances.
翻译:随着角色动画在多个领域的广泛应用,对逼真且通用的角色动画的需求急剧增长。然而,现有基于二维或三维结构对人体姿态进行建模的动画生成算法均面临各种问题,包括输出内容质量低下和训练数据不足,这阻碍了相关算法生成高质量的动画视频。为此,我们提出MVAnimate,一种基于多视角先验信息合成动态人物二维与三维信息的新型框架,旨在提升生成视频的质量。我们的方法利用多视角先验信息,生成时间一致且空间连贯的动画输出,相较于现有动画方法展现出显著改进。MVAnimate还优化了目标角色的多视角视频,从不同视角提升了视频质量。在多样化数据集上的实验结果凸显了本方法在处理各类运动模式与外观表现上的鲁棒性。