The rapid emergence of Vision-Language-Action models (VLAs) has a significant impact on robotics. However, their deployment remains complex due to the fragmented interfaces and the inherent communication latency in distributed setups. To address this, we introduce VLAgents, a modular policy server that abstracts VLA inferencing behind a unified Gymnasium-style protocol. Crucially, its communication layer transparently adapts to the context by supporting both zero-copy shared memory for high-speed simulation and compressed streaming for remote hardware. In this work, we present the architecture of VLAgents and validate it by integrating seven policies -- including OpenVLA and Pi Zero. In a benchmark with both local and remote communication, we further demonstrate how it outperforms the default policy servers provided by OpenVLA, OpenPi, and LeRobot. VLAgents is available at https://github.com/RobotControlStack/vlagents
翻译:视觉-语言-动作模型(VLA)的迅速兴起对机器人学产生了重大影响。然而,由于分布式部署中接口碎片化及固有的通信延迟,其实际部署仍面临复杂性挑战。为此,我们提出了VLAgents——一个模块化的策略服务器,通过统一的Gymnasium风格协议将VLA推理过程进行抽象封装。其核心在于通信层能根据运行场景透明自适应:既支持高速仿真所需的零拷贝共享内存机制,也兼容远程硬件场景下的压缩流传输。本文详细阐述了VLAgents的系统架构,并通过集成七种策略模型(包括OpenVLA与Pi Zero)完成功能验证。在涵盖本地与远程通信的基准测试中,我们进一步证明其性能优于OpenVLA、OpenPi及LeRobot等框架默认提供的策略服务器。VLAgents项目已开源:https://github.com/RobotControlStack/vlagents