Music-driven dance generation has garnered significant attention due to its wide range of industrial applications, particularly in the creation of group choreography. During the group dance generation process, however, most existing methods still face three primary issues: multi-dancer collisions, single-dancer foot sliding and abrupt swapping in the generation of long group dance. In this paper, we propose TCDiff++, a music-driven end-to-end framework designed to generate harmonious group dance. Specifically, to mitigate multi-dancer collisions, we utilize a dancer positioning embedding to encode temporal and identity information. Additionally, we incorporate a distance-consistency loss to ensure that inter-dancer distances remain within plausible ranges. To address the issue of single-dancer foot sliding, we introduce a swap mode embedding to indicate dancer swapping patterns and design a Footwork Adaptor to refine raw motion, thereby minimizing foot sliding. For long group dance generation, we present a long group diffusion sampling strategy that reduces abrupt position shifts by injecting positional information into the noisy input. Furthermore, we integrate a Sequence Decoder layer to enhance the model's ability to selectively process long sequences. Extensive experiments demonstrate that our TCDiff++ achieves state-of-the-art performance, particularly in long-duration scenarios, ensuring high-quality and coherent group dance generation.
翻译:音乐驱动舞蹈生成因其广泛的工业应用而受到极大关注,尤其在群体编舞创作领域。然而,在群体舞蹈生成过程中,现有方法大多仍面临三个主要问题:多舞者碰撞、单舞者脚部滑动以及长序列群体舞蹈生成中的突兀交换。本文提出TCDiff++,一种音乐驱动的端到端框架,旨在生成和谐的群体舞蹈。具体而言,为缓解多舞者碰撞问题,我们采用舞者定位嵌入编码时间和身份信息,并引入距离一致性损失以确保舞者间距离保持在合理范围内。针对单舞者脚部滑动问题,我们设计交换模式嵌入以指示舞者交换模式,并构建足部适配器优化原始动作,从而最小化脚部滑动。对于长序列群体舞蹈生成,我们提出一种长群体扩散采样策略,通过向噪声输入注入位置信息来减少突兀的位置偏移。此外,我们整合序列解码器层以增强模型对长序列的选择性处理能力。大量实验表明,我们的TCDiff++实现了最先进的性能,尤其在长时场景中能确保高质量且连贯的群体舞蹈生成。