Synthesising appropriate choreographies from music remains an open problem. We introduce MDLT, a novel approach that frames the choreography generation problem as a translation task. Our method leverages an existing data set to learn to translate sequences of audio into corresponding dance poses. We present two variants of MDLT: one utilising the Transformer architecture and the other employing the Mamba architecture. We train our method on AIST++ and PhantomDance data sets to teach a robotic arm to dance, but our method can be applied to a full humanoid robot. Evaluation metrics, including Average Joint Error and Frechet Inception Distance, consistently demonstrate that, when given a piece of music, MDLT excels at producing realistic and high-quality choreography. The code can be found at github.com/meowatthemoon/MDLT.
翻译:从音乐中合成合适的编舞仍是一个开放性问题。我们提出MDLT,一种将编舞生成问题重构为翻译任务的新颖方法。本方法利用现有数据集学习将音频序列转换为相应舞蹈姿态。我们提出了MDLT的两种变体:一种基于Transformer架构,另一种采用Mamba架构。我们在AIST++和PhantomDance数据集上训练该方法以教导机械臂跳舞,但该方法可应用于完整的人形机器人。评估指标(包括平均关节误差和弗雷歇初始距离)一致表明,在给定一段音乐时,MDLT在生成逼真且高质量的编舞方面表现优异。代码可在github.com/meowatthemoon/MDLT获取。