Simulating Infant First-Person Sensorimotor Experience via Motion Retargeting from Babies to Humanoids

Francisco M. López,Hoshinori Kanazawa,Ondrej Fiala,Yakov Balashov,Valentin Marcel,Lukas Rustler,Miles Lenz,Dongmin Kim,Yasuo Kuniyoshi,Jochen Triesch,Matej Hoffmann

from arxiv, Accepted at IEEE ICDL 2026. 8 pages, 6 figures. Cite as: F. M. López, H. Kanazawa, O. Fiala, Y. Balashov, V. Marcel, L. Rustler, M. Lenz, D. Kim, Y. Kuniyoshi, J. Triesch, and M. Hoffmann, "Simulating infant first-person sensorimotor experience via motion retargeting from babies to humanoids'', in 2026 IEEE International Conference on Development and Learning (ICDL). IEEE, 2026, pp. 1-8

Motion retargeting from humans to human-like artificial agents is becoming increasingly important as humanoid robots grow more capable. However, most existing approaches focus only on reproducing kinematics and ignore the rich sensorimotor experience associated with human movement. In this work, we present a framework for simulating the multimodal sensorimotor experiences of infants using physical and virtual humanoids. From a single video, our method reconstructs the infant's body configuration by extracting its skeletal structure and estimating the full 3D pose from each frame. Then we map the reconstructed motion onto several developmental platforms: the physical iCub robot and the virtual simulators pyCub, EMFANT and MIMo. Replaying the retargeted motions on these embodiments produces simulated multisensory streams including proprioception (joints and muscles), touch, and vision. For the best-matching embodiment, the retargeting achieves sub-centimeter accuracy and enables a rich multimodal analysis of infant development as well as enhanced automated annotation of behaviors. This framework provides a unique window into the infant's sensorimotor experience, offering new tools for robotics, developmental science, and early detection of neurodevelopmental disorders. The code is available at https://github.com/ctu-vras/motion-retargeting/.

翻译：运动重定向从人类映射到类人人工智能体正变得越来越重要，因为类人机器人的能力日益增强。然而，现有方法大多仅关注重现运动学，忽略了与人类运动相关的丰富感觉运动经验。在本工作中，我们提出一个框架，利用物理和虚拟类人机器人模拟婴儿的多模态感觉运动经验。从单一视频出发，我们的方法通过提取婴儿的骨骼结构并估计每帧的完整三维姿态，重建其身体构型。然后，我们将重建的运动映射到多个发展平台：物理iCub机器人以及虚拟模拟器pyCub、EMFANT和MIMo。在这些实体上重放重定向的运动，产生模拟的多感官流，包括本体感觉（关节和肌肉）、触觉和视觉。对于最佳匹配的实体，重定向达到亚厘米精度，并能够对婴儿发展进行丰富的多模态分析，以及增强行为的自动标注。该框架为婴儿的感觉运动经验提供了独特的视角，为机器人学、发展科学和神经发育障碍的早期检测提供了新工具。代码可在https://github.com/ctu-vras/motion-retargeting/获取。