The capacity to create realistic virtual humans has progressed significantly, and such characters can be found in many applications across entertainment, education and health. As an essential element of interactive virtual humans, speech-driven 3D gesture generation still depends heavily on perceptual evaluation, yet studies often vary avatar appearance and facial presentation when judging the generated motions. Prior work suggests these visual choices can bias motion judgments, but controlled evidence remains limited. We address this gap with controlled evaluations of co-speech gestures across motion sources, spanning seven representative avatar renderings used in contemporary research and application pipelines. Our results show that avatar and face presentation systematically shift perceptual judgments, and we provide recommendations for benchmarking gesture synthesis as well as for deploying virtual humans in human-facing applications.
翻译:创建逼真虚拟角色的能力已取得显著进展,此类角色在娱乐、教育和健康等领域的众多应用中都可见到。作为交互式虚拟角色的关键要素,语音驱动的三维手势生成仍高度依赖感知评估,然而研究者在评判生成动作时,常常改变化身外观和面部呈现方式。先前研究指出,这些视觉选择可能使动作评判产生偏差,但受控证据仍十分有限。我们通过涵盖当代研究和应用流程中七种代表性化身渲染效果的运动源评价实验填补了这一空白。结果表明,化身与面部呈现会系统性改变感知评判结果,我们据此提出手势合成基准测试及面向人类应用场景部署虚拟角色的建议方案。