As humanoid robots increasingly introduced into social scene, achieving emotionally synchronized multimodal interaction remains a significant challenges. To facilitate the further adoption and integration of humanoid robots into service roles, we present a real-time framework for NAO robots that synchronizes speech prosody with full-body gestures through three key innovations: (1) A dual-channel emotion engine where large language model (LLM) simultaneously generates context-aware text responses and biomechanically feasible motion descriptors, constrained by a structured joint movement library; (2) Duration-aware dynamic time warping for precise temporal alignment of speech output and kinematic motion keyframes; (3) Closed-loop feasibility verification ensuring gestures adhere to NAO's physical joint limits through real-time adaptation. Evaluations show 21% higher emotional alignment compared to rule-based systems, achieved by coordinating vocal pitch (arousal-driven) with upper-limb kinematics while maintaining lower-body stability. By enabling seamless sensorimotor coordination, this framework advances the deployment of context-aware social robots in dynamic applications such as personalized healthcare, interactive education, and responsive customer service platforms.
翻译:随着仿人机器人日益融入社交场景,实现情感同步的多模态交互仍面临重大挑战。为促进仿人机器人在服务角色中的进一步应用与集成,本文提出一种面向NAO机器人的实时框架,通过三项关键创新实现语音韵律与全身姿态的同步:(1) 双通道情感引擎,其中大型语言模型(LLM)在结构化关节运动库的约束下,同步生成情境感知的文本响应与生物力学可行的运动描述符;(2) 基于时长感知的动态时间规整方法,实现语音输出与运动学关键帧的精确时序对齐;(3) 闭环可行性验证机制,通过实时自适应确保姿态符合NAO的物理关节极限。评估结果表明,该框架通过协调声调(受唤醒度驱动)与上肢运动学特征,同时保持下半身稳定性,相比基于规则的系统情感对齐度提升21%。通过实现无缝的感觉运动协调,本框架推动了情境感知社交机器人在个性化医疗、交互式教育及响应式客户服务平台等动态应用场景中的部署。