Humanoid facial expression shadowing enables robots to realistically imitate human facial expressions in real time, which is critical for lifelike, facially expressive humanoid robots and affective human-robot interaction. Existing progress in humanoid facial expression imitation remains limited, often failing to achieve either real-time performance or realistic expressiveness due to offline video-based inference designs and insufficient ability to capture and transfer subtle expression details. To address these limitations, we present VividFace, a real-time and realistic facial expression shadowing system for humanoid robots. An optimized imitation framework X2CNet++ enhances expressiveness by fine-tuning the human-to-humanoid facial motion transfer module and introducing a feature-adaptation training strategy for better alignment across different image sources. Real-time shadowing is further enabled by a video-stream-compatible inference pipeline and a streamlined workflow based on asynchronous I/O for efficient communication across devices. VividFace produces vivid humanoid faces by mimicking human facial expressions within 0.05 seconds, while generalizing across diverse facial configurations. Extensive real-world demonstrations validate its practical utility. Videos are available at: https://lipzh5.github.io/VividFace/.
翻译:仿人机器人面部表情模仿使机器人能够实时逼真地模仿人类面部表情,这对于实现栩栩如生、具有丰富面部表情的仿人机器人以及情感化人机交互至关重要。现有仿人机器人表情模仿研究仍存在局限,常因采用基于离线视频的推理设计及捕捉与迁移细微表情细节的能力不足,难以同时实现实时性能与逼真表现力。为突破这些限制,我们提出了VividFace——一种面向仿人机器人的实时逼真面部表情模仿系统。优化的模仿框架X2CNet++通过微调人-仿人机器人面部运动迁移模块,并引入特征自适应训练策略以提升跨图像源的对齐能力,从而增强表情表现力。系统通过兼容视频流的推理管道与基于异步I/O的简化工作流程实现跨设备高效通信,进一步保障实时模仿性能。VividFace可在0.05秒内通过模仿人类面部表情生成生动的仿人机器人面部图像,并具备跨多样化面部构型的泛化能力。大量实际场景演示验证了其实际应用价值。演示视频详见:https://lipzh5.github.io/VividFace/。