What is the Best Automated Metric for Text to Motion Generation?

There is growing interest in generating skeleton-based human motions from natural language descriptions. While most efforts have focused on developing better neural architectures for this task, there has been no significant work on determining the proper evaluation metric. Human evaluation is the ultimate accuracy measure for this task, and automated metrics should correlate well with human quality judgments. Since descriptions are compatible with many motions, determining the right metric is critical for evaluating and designing effective generative models. This paper systematically studies which metrics best align with human evaluations and proposes new metrics that align even better. Our findings indicate that none of the metrics currently used for this task show even a moderate correlation with human judgments on a sample level. However, for assessing average model performance, commonly used metrics such as R-Precision and less-used coordinate errors show strong correlations. Additionally, several recently developed metrics are not recommended due to their low correlation compared to alternatives. We also introduce a novel metric based on a multimodal BERT-like model, MoBERT, which offers strongly human-correlated sample-level evaluations while maintaining near-perfect model-level correlation. Our results demonstrate that this new metric exhibits extensive benefits over all current alternatives.

翻译：从自然语言描述生成基于骨架的人体动作正引起越来越多的关注。尽管大多数研究致力于为该任务开发更优的神经网络架构，但在确定合适的评估度量方面尚未有重要工作。人工评估是该任务最终的准确度衡量标准，而自动化度量应与人类质量判断高度相关。由于描述与多种动作兼容，确定正确的度量对于评估和设计有效的生成模型至关重要。本文系统研究了哪些度量与人工评估最为契合，并提出了与之更吻合的新度量。我们的发现表明，当前用于该任务的任何度量在样本层面与人类判断甚至未表现出中等相关性。然而，在评估平均模型性能时，常用度量如R-Precision以及较少使用的坐标误差表现出强相关性。此外，由于与替代方案相比相关性较低，一些近期开发的度量不被推荐。我们还引入了一种基于多模态类BERT模型MoBERT的新型度量，该度量在保持近乎完美的模型层面相关性的同时，提供了与人类高度相关的样本层面评估。我们的结果表明，这一新型度量相较于所有现有替代方案具有广泛优势。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日