Animatronic robots hold the promise of enabling natural human-robot interaction through lifelike facial expressions. However, generating realistic, speech-synchronized robot expressions poses significant challenges due to the complexities of facial biomechanics and the need for responsive motion synthesis. This paper introduces a novel, skinning-centric approach to drive animatronic robot facial expressions from speech input. At its core, the proposed approach employs linear blend skinning (LBS) as a unifying representation, guiding innovations in both embodiment design and motion synthesis. LBS informs the actuation topology, facilitates human expression retargeting, and enables efficient speech-driven facial motion generation. This approach demonstrates the capability to produce highly realistic facial expressions on an animatronic face in real-time at over 4000 fps on a single Nvidia RTX 4090, significantly advancing robots' ability to replicate nuanced human expressions for natural interaction. To foster further research and development in this field, the code has been made publicly available at: \url{https://github.com/library87/OpenRoboExp}.
翻译:拟人机器人通过逼真的面部表情有望实现自然的人机交互。然而,由于面部生物力学的复杂性以及对响应式运动合成的需求,生成逼真且与语音同步的机器人表情仍面临重大挑战。本文提出了一种新颖的、以蒙皮为核心的方法,从语音输入驱动拟人机器人的面部表情。该方法的核心理念是采用线性混合蒙皮(LBS)作为统一表示,以此指导实体化设计和运动合成两方面的创新。LBS为驱动拓扑结构提供依据,促进了人类表情的重定向,并实现了高效的语音驱动面部运动生成。该方法展示了在单个Nvidia RTX 4090上以超过4000 fps的实时速率,在拟人面部上生成高度逼真表情的能力,显著提升了机器人复现细腻人类表情以实现自然交互的水平。为促进该领域的进一步研究与开发,代码已在以下地址公开:\url{https://github.com/library87/OpenRoboExp}。