State Representations as Incentives for Reinforcement Learning Agents: A Sim2Real Analysis on Robotic Grasping

Choosing an appropriate representation of the environment for the underlying decision-making process of the reinforcement learning agent is not always straightforward. The state representation should be inclusive enough to allow the agent to informatively decide on its actions and disentangled enough to simplify policy training and the corresponding sim2real transfer. Given this outlook, this work examines the effect of various representations in incentivizing the agent to solve a specific robotic task: antipodal and planar object grasping. A continuum of state representations is defined, starting from hand-crafted numerical states to encoded image-based representations, with decreasing levels of induced task-specific knowledge. The effects of each representation on the ability of the agent to solve the task in simulation and the transferability of the learned policy to the real robot are examined and compared against a model-based approach with complete system knowledge. The results show that reinforcement learning agents using numerical states can perform on par with non-learning baselines. Furthermore, we find that agents using image-based representations from pre-trained environment embedding vectors perform better than end-to-end trained agents, and hypothesize that separation of representation learning from reinforcement learning can benefit sim2real transfer. Finally, we conclude that incentivizing the state representation with task-specific knowledge facilitates faster convergence for agent training and increases success rates in sim2real robot control.

翻译：为强化学习智能体的底层决策过程选择合适的环墋表征并非总是直截了当。状态表征应具有足够的包容性，使智能体能够基于充分信息决定其动作，同时具有足够的解耦性，以简化策略训练及相应的仿真到现实迁移。基于这一视角，本研究考察了不同表征在激励智能体解决特定机器人任务——对映平面物体抓取——中的效果。我们定义了一个状态表征的连续统，从手工设计的数值状态到基于编码的图像表征，其诱导的任务特定知识水平逐级递减。我们检验了每种表征对智能体在仿真中解决任务的能力以及所学策略向真实机器人可迁移性的影响，并与具有完整系统知识的基于模型的方法进行了比较。结果表明，使用数值状态的强化学习智能体可以达到与非学习基线相当的性能。此外，我们发现使用来自预训练环境嵌入向量的基于图像的表征的智能体，其表现优于端到端训练的智能体，并假设将表征学习与强化学习分离可能有益于仿真到现实的迁移。最后，我们得出结论：用任务特定知识激励状态表征，有助于加速智能体训练的收敛，并提高仿真到现实机器人控制中的成功率。