A multidimensional measurement of photorealistic avatar quality of experience

Photorealistic avatars are human avatars that look, move, and talk like real people. The performance of photorealistic avatars has significantly improved recently based on objective metrics such as PSNR, SSIM, LPIPS, FID, and FVD. However, recent photorealistic avatar publications do not provide subjective tests of the avatars to measure human usability factors. We provide an open source test framework to subjectively measure photorealistic avatar performance in ten dimensions: realism, trust, comfortableness using, comfortableness interacting with, appropriateness for work, creepiness, formality, affinity, resemblance to the person, and emotion accuracy. We show that the correlation of nine of these subjective metrics with PSNR, SSIM, LPIPS, FID, and FVD is weak, and moderate for emotion accuracy. The crowdsourced subjective test framework is highly reproducible and accurate when compared to a panel of experts. We analyze a wide range of avatars from photorealistic to cartoon-like and show that some photorealistic avatars are approaching real video performance based on these dimensions. We also find that for avatars above a certain level of realism, eight of these measured dimensions are strongly correlated. In particular, for photorealistic avatars there is a linear relationship between avatar affinity and realism; in other words, there is no uncanny valley effect for photorealistic avatars in the telecommunication scenario. We provide several extensions of this test framework for future work and discuss design implications for telecommunication systems. The test framework is available at https://github.com/microsoft/P.910.

翻译：逼真数字人是指在外观、动作和言谈上均与真人相似的人类数字形象。近期，基于PSNR、SSIM、LPIPS、FID和FVD等客观指标，逼真数字人的性能已显著提升。然而，近期的逼真数字人相关研究未提供针对数字人的主观测试以衡量人类可用性因素。我们提出一个开源测试框架，用于在十个维度上主观评估逼真数字人的性能：真实感、信任度、使用舒适度、交互舒适度、工作适用性、诡异感、正式度、亲和力、与本人的相似度以及情绪准确性。研究表明，其中九项主观指标与PSNR、SSIM、LPIPS、FID和FVD的关联性较弱，仅情绪准确性呈现中等关联。与专家小组评估相比，该众包主观测试框架具有高度可复现性和准确性。我们分析了从逼真风格到卡通风格的广泛数字人类型，结果表明基于这些维度，部分逼真数字人已接近真实视频的表现水平。同时发现，对于真实感超过特定阈值的数字人，其中八个测量维度呈现强相关性。特别地，逼真数字人的亲和力与真实感存在线性关系；换言之，在远程通信场景中逼真数字人未出现恐怖谷效应。我们为该测试框架提供了若干扩展方向以供未来研究，并探讨了其对远程通信系统的设计启示。测试框架可通过 https://github.com/microsoft/P.910 获取。