Recent advances in embodied intelligence have leveraged massive scaling of data and model parameters to master natural-language command following and multi-task control. In contrast, biological systems demonstrate an innate ability to acquire skills rapidly from sparse experience. Crucially, current robotic policies struggle to replicate the dynamic stability, reflexive responsiveness, and temporal memory inherent in biological motion. Here we present Neuromorphic Vision-Language-Action (NeuroVLA), a framework that mimics the structural organization of the bio-nervous system between the cortex, cerebellum, and spinal cord. We adopt a system-level bio-inspired design: a high-level model plans goals, an adaptive cerebellum module stabilizes motion using high-frequency sensors feedback, and a bio-inspired spinal layer executes lightning-fast actions generation. NeuroVLA represents the first deployment of a neuromorphic VLA on physical robotics, achieving state-of-the-art performance. We observe the emergence of biological motor characteristics without additional data or special guidance: it stops the shaking in robotic arms, saves significant energy(only 0.4w on Neuromorphic Processor), shows temporal memory ability and triggers safety reflexes in less than 20 milliseconds.
翻译:近年来,具身智能领域通过大规模数据和模型参数的扩展,在自然语言指令跟随与多任务控制方面取得了显著进展。相比之下,生物系统展现出从稀疏经验中快速习得技能的先天能力。关键在于,当前机器人策略难以复现生物运动固有的动态稳定性、反射响应性和时序记忆能力。本文提出神经形态视觉-语言-动作框架,该框架模拟了生物神经系统中大脑皮层、小脑与脊髓之间的结构组织。我们采用系统级的仿生设计:高层模型负责规划目标,自适应小脑模块利用高频传感器反馈稳定运动,而仿生脊髓层则执行闪电般的动作生成。该框架首次实现了神经形态视觉-语言-动作在实体机器人上的部署,并取得了最先进的性能表现。我们观察到,在无需额外数据或特殊引导的情况下,系统涌现出生物运动特性:有效抑制机械臂抖动,显著降低能耗,展现出时序记忆能力,并在20毫秒内触发安全反射。