We propose MDSC(Music-Dance-Style Consistency), the first evaluation metric which assesses to what degree the dance moves and music match. Existing metrics can only evaluate the fidelity and diversity of motion and the degree of rhythmic matching between music and motion. MDSC measures how stylistically correlated the generated dance motion sequences and the conditioning music sequences are. We found that directly measuring the embedding distance between motion and music is not an optimal solution. We instead tackle this through modelling it as a clustering problem. Specifically, 1) we pre-train a music encoder and a motion encoder, then 2) we learn to map and align the motion and music embedding in joint space by jointly minimizing the intra-cluster distance and maximizing the inter-cluster distance, and 3) for evaluation purpose, we encode the dance moves into embedding and measure the intra-cluster and inter-cluster distances, as well as the ratio between them. We evaluate our metric on the results of several music-conditioned motion generation methods, combined with user study, we found that our proposed metric is a robust evaluation metric in measuring the music-dance style correlation. The code is available at: https://github.com/zixiangzhou916/MDSC.
翻译:本文提出MDSC(音乐-舞蹈-风格一致性),首个用于评估舞蹈动作与音乐匹配程度的评价指标。现有指标仅能评估动作的保真度与多样性,以及音乐与动作之间的节奏匹配程度。MDSC则测量生成舞蹈动作序列与条件音乐序列在风格上的相关性。我们发现直接测量动作与音乐之间的嵌入距离并非最优方案,因此将其建模为聚类问题来解决。具体而言:1)预训练音乐编码器与动作编码器;2)通过联合最小化类内距离与最大化类间距离,学习将动作与音乐嵌入映射到联合空间并对齐;3)在评估时,将舞蹈动作编码为嵌入向量,测量类内距离、类间距离及其比值。我们基于多个音乐条件动作生成方法的实验结果与用户研究验证了该指标,发现MDSC是衡量音乐-舞蹈风格相关性的稳健评估指标。代码开源于:https://github.com/zixiangzhou916/MDSC。