The development of generative artificial intelligence for human motion generation has expanded rapidly, necessitating a unified evaluation framework. This paper presents a detailed review of eight evaluation metrics for human motion generation, highlighting their unique features and shortcomings. We propose standardized practices through a unified evaluation setup to facilitate consistent model comparisons. Additionally, we introduce a novel metric that assesses diversity in temporal distortion by analyzing warping diversity, thereby enhancing the evaluation of temporal data. We also conduct experimental analyses of three generative models using a publicly available dataset, offering insights into the interpretation of each metric in specific case scenarios. Our goal is to offer a clear, user-friendly evaluation framework for newcomers, complemented by publicly accessible code.
翻译:针对人体运动生成的生成式人工智能发展迅速,亟需统一的评估框架。本文详细综述了八种人体运动生成评估指标,阐明了其独特特征与局限性。我们通过统一评估设置提出标准化实践方案,以促进模型间的一致性比较。此外,我们引入一种新型度量标准,通过分析扭曲多样性来评估时间失真维度上的多样性,从而增强时序数据的评估能力。同时,我们利用公开数据集对三种生成模型进行实验分析,揭示各指标在具体案例场景中的解读方式。本研究旨在为新手提供清晰易用的评估框架,并辅以公开可获取的代码。