Memory is crucial for enabling agents to tackle complex tasks with temporal and spatial dependencies. While many reinforcement learning (RL) algorithms incorporate memory, the field lacks a universal benchmark to assess an agent's memory capabilities across diverse scenarios. This gap is particularly evident in tabletop robotic manipulation, where memory is essential for solving tasks with partial observability and ensuring robust performance, yet no standardized benchmarks exist. To address this, we introduce MIKASA (Memory-Intensive Skills Assessment Suite for Agents), a comprehensive benchmark for memory RL, with three key contributions: (1) we propose a comprehensive classification framework for memory-intensive RL tasks, (2) we collect MIKASA-Base -- a unified benchmark that enables systematic evaluation of memory-enhanced agents across diverse scenarios, and (3) we develop MIKASA-Robo (pip install mikasa-robo-suite) -- a novel benchmark of 32 carefully designed memory-intensive tasks that assess memory capabilities in tabletop robotic manipulation. Our work introduces a unified framework to advance memory RL research, enabling more robust systems for real-world use. MIKASA is available at https://tinyurl.com/membenchrobots.
翻译:记忆对于智能体处理具有时空依赖性的复杂任务至关重要。尽管许多强化学习算法已整合记忆机制,但该领域仍缺乏一个通用基准来评估智能体在不同场景下的记忆能力。这一空白在桌面机器人操作领域尤为明显——记忆对于解决部分可观测性任务和确保鲁棒性能具有关键作用,却尚未建立标准化基准。为此,我们提出了MIKASA(智能体记忆密集型技能评估套件),这是一个面向记忆强化学习的综合性基准,包含三项核心贡献:(1)提出记忆密集型强化学习任务的系统分类框架;(2)构建MIKASA-Base——支持跨场景系统化评估记忆增强型智能体的统一基准;(3)开发MIKASA-Robo(可通过pip install mikasa-robo-suite安装)——包含32个精心设计的记忆密集型任务,专门评估桌面机器人操作中的记忆能力。本研究通过建立统一框架推动记忆强化学习的发展,为构建更鲁棒的实用系统奠定基础。MIKASA项目地址:https://tinyurl.com/membenchrobots。