Vision-language-action (VLA) models and LLM agents have advanced rapidly, yet reliable deployment on physical robots is often hindered by an interface mismatch between agent tool APIs and robot middleware. Current implementations typically rely on ad-hoc wrappers that are difficult to reuse, and changes to the VLA backend or serving stack often necessitate extensive re-integration. We introduce RoboNeuron, a middleware layer that connects the Model Context Protocol (MCP) for LLM agents with robot middleware such as ROS2. RoboNeuron bridges these ecosystems by deriving agent-callable tools directly from ROS schemas, providing a unified execution abstraction that supports both direct commands and modular composition, and localizing backend, runtime, and acceleration-preset changes within a stable inference boundary. We evaluate RoboNeuron in simulation and on hardware through multi-platform base control, arm motion, and VLA-based grasping tasks, demonstrating that it enables modular system orchestration under a unified interface while supporting backend transitions without system rewiring. The full code implementation of this work is available at github repo: https://github.com/guanweifan/RoboNeuron
翻译:视觉-语言-动作模型与大型语言模型智能体发展迅速,但在物理机器人上的可靠部署常受限于智能体工具API与机器人中间件之间的接口不匹配。当前实现通常依赖难以复用的临时封装,且视觉-语言-动作后端或服务栈的变更往往需要大量重新集成。我们提出RoboNeuron——一种连接大型语言模型智能体的模型上下文协议与ROS2等机器人中间件的中间层。RoboNeuron通过直接从ROS模式中推导出智能体可调用的工具来桥接这两个生态系统,提供支持直接指令与模块化组合的统一执行抽象,并将后端、运行时及加速预设变更限制在稳定的推理边界内。我们在仿真与硬件环境下,通过多平台基础控制、机械臂运动及基于视觉-语言-动作的抓取任务对RoboNeuron进行评估,证明其能在统一接口下实现模块化系统编排,并支持无系统重构的后端切换。本研究的完整代码实现见GitHub仓库:https://github.com/guanweifan/RoboNeuron