Choosing an appropriate representation of the environment for the underlying decision-making process of the RL agent is not always straightforward. The state representation should be inclusive enough to allow the agent to informatively decide on its actions and compact enough to increase sample efficiency for policy training. Given this outlook, this work examines the effect of various state representations in incentivizing the agent to solve a specific robotic task: antipodal and planar object grasping. A continuum of state representation abstractions is defined, starting from a model-based approach with complete system knowledge, through hand-crafted numerical, to image-based representations with decreasing level of induced task-specific knowledge. We examine the effects of each representation in the ability of the agent to solve the task in simulation and the transferability of the learned policy to the real robot. The results show that RL agents using numerical states can perform on par with non-learning baselines. Furthermore, we find that agents using image-based representations from pre-trained environment embedding vectors perform better than end-to-end trained agents, and hypothesize that task-specific knowledge is necessary for achieving convergence and high success rates in robot control.
翻译:为强化学习智能体的底层决策过程选择合适的环境表示并非总是直接了当。状态表示应具备足够的包容性使智能体能明智地决策行动,同时保持紧凑以提高策略训练的样本效率。基于这一视角,本研究考察了不同状态表示在激励智能体解决特定机器人任务(对极与平面物体抓取)时的效果。我们定义了一个连续的状态表示抽象谱系:从具有完整系统知识的基于模型方法,经手工设计的数值表示,到任务诱导知识含量递减的基于图像表示。我们检验了每种表示对智能体在仿真环境中完成任务能力的影响,以及习得策略向真实机器人迁移的效果。结果表明,使用数值状态的强化学习智能体性能可与非学习基线方法相当。进一步发现,采用预训练环境嵌入向量的基于图像表示的智能体表现优于端到端训练的智能体,我们推测任务特异性知识对于实现机器人控制中的收敛与高成功率至关重要。