We propose MDSC(Music-Dance-Style Consistency), the first evaluation metric that assesses to what degree the dance moves and music match. Existing metrics can only evaluate the motion fidelity and diversity and the degree of rhythmic matching between music and dance. MDSC measures how stylistically correlated the generated dance motion sequences and the conditioning music sequences are. We found that directly measuring the embedding distance between motion and music is not an optimal solution. We instead tackle this through modeling it as a clustering problem. Specifically, 1) we pre-train a music encoder and a motion encoder, then 2) we learn to map and align the motion and music embedding in joint space by jointly minimizing the intra-cluster distance and maximizing the inter-cluster distance, and 3) for evaluation purposes, we encode the dance moves into embedding and measure the intra-cluster and inter-cluster distances, as well as the ratio between them. We evaluate our metric on the results of several music-conditioned motion generation methods, combined with user study, we found that our proposed metric is a robust evaluation metric in measuring the music-dance style correlation.
翻译:我们提出MDSC(音乐-舞蹈-风格一致性),这是首个评估舞蹈动作与音乐匹配程度的评价指标。现有指标仅能评估动作保真度、多样性以及音乐与舞蹈的节奏匹配程度。MDSC则度量生成的舞蹈动作序列与条件音乐序列之间的风格相关性。我们观察到直接测量动作与音乐嵌入距离并非最优方案,转而将该问题建模为聚类任务进行处理。具体而言:1)预训练音乐编码器与动作编码器;2)通过联合最小化簇内距离并最大化簇间距离,学习将动作与音乐嵌入映射到联合空间并实现对齐;3)在评估阶段,将舞蹈动作编码为嵌入向量,分别测量簇内距离、簇间距离及二者比值。我们基于多种音乐条件运动生成方法的结果对该指标进行评估,结合用户研究发现,所提指标在度量音乐-舞蹈风格相关性方面具有鲁棒性。