Dance, as an art form, fundamentally hinges on the precise synchronization with musical beats. However, achieving aesthetically pleasing dance sequences from music is challenging, with existing methods often falling short in controllability and beat alignment. To address these shortcomings, this paper introduces Beat-It, a novel framework for beat-specific, key pose-guided dance generation. Unlike prior approaches, Beat-It uniquely integrates explicit beat awareness and key pose guidance, effectively resolving two main issues: the misalignment of generated dance motions with musical beats, and the inability to map key poses to specific beats, critical for practical choreography. Our approach disentangles beat conditions from music using a nearest beat distance representation and employs a hierarchical multi-condition fusion mechanism. This mechanism seamlessly integrates key poses, beats, and music features, mitigating condition conflicts and offering rich, multi-conditioned guidance for dance generation. Additionally, a specially designed beat alignment loss ensures the generated dance movements remain in sync with the designated beats. Extensive experiments confirm Beat-It's superiority over existing state-of-the-art methods in terms of beat alignment and motion controllability.
翻译:舞蹈作为一种艺术形式,其核心在于与音乐节拍的精准同步。然而,从音乐生成具有美感的舞蹈序列极具挑战性,现有方法通常在可控性和节拍对齐方面存在不足。为应对这些缺陷,本文提出了Beat-It,一个用于节拍特定、关键姿态引导的舞蹈生成新框架。与先前方法不同,Beat-It创新性地整合了显式的节拍感知和关键姿态引导,有效解决了两个主要问题:生成的舞蹈动作与音乐节拍的对齐失准,以及无法将关键姿态映射到特定节拍(这对实际编舞至关重要)。我们的方法通过最近节拍距离表示将节拍条件从音乐中解耦,并采用一种层次化的多条件融合机制。该机制无缝整合了关键姿态、节拍和音乐特征,缓解了条件冲突,并为舞蹈生成提供了丰富的多条件引导。此外,一个专门设计的节拍对齐损失确保生成的舞蹈动作与指定节拍保持同步。大量实验证实,Beat-It在节拍对齐和动作可控性方面优于现有的先进方法。