Peer learning is a novel high-level reinforcement learning framework for agents learning in groups. While standard reinforcement learning trains an individual agent in trial-and-error fashion, all on its own, peer learning addresses a related setting in which a group of agents, i.e., peers, learns to master a task simultaneously together from scratch. Peers are allowed to communicate only about their own states and actions recommended by others: "What would you do in my situation?". Our motivation is to study the learning behavior of these agents. We formalize the teacher selection process in the action advice setting as a multi-armed bandit problem and therefore highlight the need for exploration. Eventually, we analyze the learning behavior of the peers and observe their ability to rank the agents' performance within the study group and understand which agents give reliable advice. Further, we compare peer learning with single agent learning and a state-of-the-art action advice baseline. We show that peer learning is able to outperform single-agent learning and the baseline in several challenging discrete and continuous OpenAI Gym domains. Doing so, we also show that within such a framework complex policies from action recommendations beyond discrete action spaces can evolve.
翻译:同行学习是一种新颖的高阶强化学习框架,适用于群体中智能体的学习过程。标准强化学习以试错方式独立训练单个智能体,而同行学习则关注一组智能体(即同伴)从零开始同时协作掌握任务的场景。同伴之间仅允许交流自身状态及他人建议的动作:"若你处于我的处境,会如何行动?"我们的研究动机在于分析这些智能体的学习行为。我们将动作建议场景中的教师选择过程形式化为多臂老虎机问题,从而凸显探索的必要性。最终,我们分析了同伴的学习行为,观察到他们能在研究组内对智能体表现进行排序,并识别哪些智能体能提供可靠建议。此外,我们将同行学习与单智能体学习及当前最先进的建议基线方法进行对比。实验表明,在多个具有挑战性的离散与连续动作空间(OpenAI Gym环境)任务中,同行学习能超越单智能体学习及基线方法。这一结果还证明,在该框架下,超越离散动作空间的复杂策略能够从动作建议中演化生成。