Integrating Large Language Models (VLMs) and Vision-Language Models (VLMs) with robotic systems enables robots to process and understand complex natural language instructions and visual information. However, a fundamental challenge remains: for robots to fully capitalize on these advancements, they must have a deep understanding of their physical embodiment. The gap between AI models cognitive capabilities and the understanding of physical embodiment leads to the following question: Can a robot autonomously understand and adapt to its physical form and functionalities through interaction with its environment? This question underscores the transition towards developing self-modeling robots without reliance on external sensory or pre-programmed knowledge about their structure. Here, we propose a meta self modeling that can deduce robot morphology through proprioception (the internal sense of position and movement). Our study introduces a 12 DoF reconfigurable legged robot, accompanied by a diverse dataset of 200k unique configurations, to systematically investigate the relationship between robotic motion and robot morphology. Utilizing a deep neural network model comprising a robot signature encoder and a configuration decoder, we demonstrate the capability of our system to accurately predict robot configurations from proprioceptive signals. This research contributes to the field of robotic self-modeling, aiming to enhance understanding of their physical embodiment and adaptability in real world scenarios.
翻译:将大型语言模型(VLMs)和视觉语言模型(VLMs)与机器人系统相集成,使机器人能够处理并理解复杂的自然语言指令及视觉信息。然而,一个根本性挑战依然存在:要使机器人充分利用这些技术进步,它们必须深刻理解其物理具身形态。人工智能模型的认知能力与对物理具身形态理解之间的鸿沟引出了以下问题:机器人能否通过与环境的交互自主理解并适应其物理形态和功能?这一问题凸显了向开发不依赖外部传感器或预编程结构知识的自建模机器人过渡的需求。在此,我们提出了一种元自建模方法,能够通过本体感知(即位置和运动的内在感知)推断机器人的形态。本研究引入了一台具有12个自由度的可重构腿式机器人,并配备包含20万种独特构型的多样化数据集,以系统探究机器人运动与形态之间的关系。通过采用由机器人特征编码器和构型解码器组成的深度神经网络模型,我们展示了系统从本体感知信号中准确预测机器人构型的能力。本研究为机器人自建模领域做出了贡献,旨在提升机器人对物理具身形态的理解及在真实场景中的适应性。