Human image animation aims to generate a human motion video from the inputs of a reference human image and a target motion video. Current diffusion-based image animation systems exhibit high precision in transferring human identity into targeted motion, yet they still exhibit irregular quality in their outputs. Their optimal precision is achieved only when the physical compositions (i.e., scale and rotation) of the human shapes in the reference image and target pose frame are aligned. In the absence of such alignment, there is a noticeable decline in fidelity and consistency. Especially, in real-world environments, this compositional misalignment commonly occurs, posing significant challenges to the practical usage of current systems. To this end, we propose Test-time Procrustes Calibration (TPC), which enhances the robustness of diffusion-based image animation systems by maintaining optimal performance even when faced with compositional misalignment, effectively addressing real-world scenarios. The TPC provides a calibrated reference image for the diffusion model, enhancing its capability to understand the correspondence between human shapes in the reference and target images. Our method is simple and can be applied to any diffusion-based image animation system in a model-agnostic manner, improving the effectiveness at test time without additional training.
翻译:人体图像动画旨在从参考人体图像和目标运动视频的输入中生成人体运动视频。当前基于扩散模型的图像动画系统在将人体身份迁移至目标运动方面展现出高精度,但其输出质量仍存在不稳定性。仅当参考图像与目标姿态帧中人体形状的物理构成(即尺度和旋转)对齐时,系统才能达到最优精度。若缺乏此类对齐,生成结果的保真度与一致性会出现显著下降。尤其在现实场景中,此类构成错位现象普遍存在,对现有系统的实际应用构成了重大挑战。为此,我们提出测试时间普氏校准方法,该方法通过确保在面临构成错位时仍能维持最优性能,有效增强基于扩散模型的图像动画系统的鲁棒性,从而切实应对现实场景需求。TPC为扩散模型提供校准后的参考图像,提升其理解参考图像与目标图像间人体形状对应关系的能力。本方法简洁高效,能以模型无关的方式应用于任何基于扩散模型的图像动画系统,在无需额外训练的情况下提升测试时的生成效果。