Assistax: A Multi-Agent Hardware-Accelerated Reinforcement Learning Benchmark for Assistive Robotics

Leonard Hinckeldey,Elliot Fosong,Rimvydas Rubavicius,Elle Miller,Trevor McInroe,Fan Zhang,Patricia Wollstadt,Stefano V. Albrecht,Subramanian Ramamoorthy

from arxiv, Accepted at the Reinforcement Learning Conference 2026

The development of reinforcement learning (RL) algorithms has been largely driven by ambitious challenge tasks and benchmarks. Games have dominated RL benchmarks because they present relevant challenges, are inexpensive to run and easy to understand. While games such as Go and Atari have led to many breakthroughs, they often do not directly translate to real-world embodied applications. In recognising the need to diversify RL benchmarks and addressing complexities that arise in embodied interaction scenarios, we introduce Assistax: an open-source benchmark designed to address challenges arising in assistive robotics tasks. Assistax uses JAX's hardware acceleration for significant speed-ups for learning in physics-based simulations. In terms of open-loop wall-clock time, Assistax runs up to $370\times$ faster when vectorising training runs compared to CPU-based alternatives. Assistax conceptualises the interaction between an assistive robot and an active human patient using multi-agent RL to train a population of diverse partner agents against which an embodied robotic agent's zero-shot coordination capabilities can be tested. Extensive evaluation and hyperparameter tuning for popular continuous control RL and MARL algorithms provide reliable baselines and establish Assistax as a practical benchmark for advancing RL research for assistive robotics. The code is available at: https://github.com/assistive-autonomy/assistax.

翻译：强化学习（RL）算法的开发很大程度上由具有挑战性的任务与基准所推动。游戏之所以主导强化学习基准，是因为它们既呈现了相关挑战，又兼具运行成本低、易于理解的优势。尽管围棋和Atari等游戏带来了诸多突破，但其成果往往难以直接迁移至现实世界的具身化应用场景。鉴于强化学习基准多样化的需求以及具身交互场景中涌现的复杂性，我们提出Assistax：一个面向辅助机器人任务挑战的开源基准框架。Assistax利用JAX的硬件加速功能，在基于物理的仿真学习中实现了显著加速。在开环时钟墙时间方面，Assistax在向量化训练运行时比基于CPU的替代方案快高达$370\times$。该基准通过多智能体强化学习概念化辅助机器人与活跃患者之间的交互，训练多样化伙伴智能体群体，从而可测试具身机器人智能体的零样本协调能力。针对主流通用连续控制强化学习与多智能体强化学习算法的广泛评估与超参数调优，提供了可靠的基线结果，并将Assistax确立为推动辅助机器人领域强化学习研究的实用基准。代码开源地址：https://github.com/assistive-autonomy/assistax。