CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation

Character image animation is gaining significant importance across various domains, driven by the demand for robust and flexible multi-subject rendering. While existing methods excel in single-person animation, they struggle to handle arbitrary subject counts, diverse character types, and spatial misalignment between the reference image and the driving poses. We attribute these limitations to an overly rigid spatial binding that forces strict pixel-wise alignment between the pose and reference, and an inability to consistently rebind motion to intended subjects. To address these challenges, we propose CoDance, a novel Unbind-Rebind framework that enables the animation of arbitrary subject counts, types, and spatial configurations conditioned on a single, potentially misaligned pose sequence. Specifically, the Unbind module employs a novel pose shift encoder to break the rigid spatial binding between the pose and the reference by introducing stochastic perturbations to both poses and their latent features, thereby compelling the model to learn a location-agnostic motion representation. To ensure precise control and subject association, we then devise a Rebind module, leveraging semantic guidance from text prompts and spatial guidance from subject masks to direct the learned motion to intended characters. Furthermore, to facilitate comprehensive evaluation, we introduce a new multi-subject CoDanceBench. Extensive experiments on CoDanceBench and existing datasets show that CoDance achieves SOTA performance, exhibiting remarkable generalization across diverse subjects and spatial layouts. The code and weights will be open-sourced.

翻译：角色图像动画在各领域的重要性日益凸显，这得益于对鲁棒且灵活的多主体渲染的需求。现有方法虽然在单人动画方面表现出色，但难以处理任意数量的主体、多样化的角色类型以及参考图像与驱动姿态之间的空间错位。我们将这些局限性归因于过于僵化的空间绑定（强制姿态与参考图像之间严格的像素级对齐）以及无法将运动一致地重绑定至目标主体。为应对这些挑战，我们提出CoDance——一种新颖的解绑-重绑框架，能够在单一（可能存在错位的）姿态序列条件下，对任意数量、类型及空间配置的主体进行动画生成。具体而言，解绑模块采用新型姿态偏移编码器，通过对姿态及其潜在特征引入随机扰动，打破姿态与参考图像间的刚性空间绑定，从而迫使模型学习位置无关的运动表征。为确保精确控制和主体关联，我们进一步设计重绑模块，利用文本提示的语义引导和主体掩码的空间引导，将习得的运动定向传输至目标角色。此外，为促进全面评估，我们构建了新的多主体评测基准CoDanceBench。在CoDanceBench及现有数据集上的大量实验表明，CoDance实现了最先进的性能，并在多样化主体与空间布局上展现出卓越的泛化能力。代码与模型权重将开源发布。