Distinguishing self from others is a prerequisite for social intelligence, yet humanoid robots that increasingly share workspaces with humans still lack this ability. Here we show that a humanoid robot can learn self-other distinction from proprioceptive-visual correspondence, without any identity labels or kinematic models. Once established, this distinction bootstraps a predictive self-model that maps joint configurations to three-dimensional body occupancy, capturing how the robot's body changes with action. In multi-agent scenes involving humans or morphologically identical robots, the system reliably identifies itself, learns a 3D self-model, and supports downstream tasks including target reaching, collision-aware motion planning, and human-to-robot motion retargeting. Together, these results outline a route toward bodily self-representation in robots that act and coordinate alongside others in shared physical environments. Project page: https://euron-zc.github.io/humanoid-self-model/.
翻译:区分自我与他人是社会智能的前提条件,然而与人类共享工作空间的类人机器人仍缺乏这一能力。本文证明,类人机器人无需身份标签或运动学模型,即可通过本体感觉与视觉的对应关系学习自我-他人区分。这种区分建立后,能够引导形成一种预测性自我模型,该模型将关节构型映射为三维身体占据空间,捕获机器人身体随动作变化的情况。在涉及人类或形态相同机器人的多智能体场景中,该系统可可靠地识别自身、学习三维自我模型,并支持下游任务,包括目标到达、碰撞感知运动规划及人类到机器人的运动重定向。综合而言,这些结果为在共享物理环境中协同行动的机器人建立身体自我表征提供了路径。项目页面:https://euron-zc.github.io/humanoid-self-model/。