A reliable and comprehensive evaluation metric that aligns with manual preference assessments is crucial for conversational head video synthesis methods development. Existing quantitative evaluations often fail to capture the full complexity of human preference, as they only consider limited evaluation dimensions. Qualitative evaluations and user studies offer a solution but are time-consuming and labor-intensive. This limitation hinders the advancement of conversational head generation algorithms and systems. In this paper, we propose a novel learning-based evaluation metric named Preference Score (PS) for fitting human preference according to the quantitative evaluations across different dimensions. PS can serve as a quantitative evaluation without the need for human annotation. Experimental results validate the superiority of Preference Score in aligning with human perception, and also demonstrate robustness and generalizability to unseen data, making it a valuable tool for advancing conversation head generation. We expect this metric could facilitate new advances in conversational head generation. Project Page: https://https://github.com/dc3ea9f/PreferenceScore.
翻译:可靠且全面的评估指标,能够与人工偏好评估保持一致,对于对话式头部视频合成方法的发展至关重要。现有的定量评估往往未能捕捉人类偏好的全部复杂性,因为它们仅考虑有限的评估维度。定性评估和用户研究提供了一种解决方案,但耗时且劳动密集。这一局限性阻碍了对话式头部生成算法与系统的进步。在本文中,我们提出了一种名为偏好分数(PS)的基于学习的评估指标,该指标根据不同维度的定量评估来拟合人类偏好。PS 可作为无需人工标注的定量评估工具。实验结果验证了偏好分数在匹配人类感知方面的优越性,并展示了其对未见数据的鲁棒性和泛化能力,使其成为推动对话式头部生成的宝贵工具。我们期望该指标能促进对话式头部生成的新进展。项目页面:https://github.com/dc3ea9f/PreferenceScore。