Reinforcement Learning (RL) bears the promise of being an enabling technology for many applications. However, since most of the literature in the field is currently focused on opaque models, the use of RL in high-stakes scenarios, where interpretability is crucial, is still limited. Recently, some approaches to interpretable RL, e.g., based on Decision Trees, have been proposed, but one of the main limitations of these techniques is their training cost. To overcome this limitation, we propose a new population-based method, called Social Interpretable RL (SIRL), inspired by social learning principles, to improve learning efficiency. Our method mimics a social learning process, where each agent in a group learns to solve a given task based both on its own individual experience as well as the experience acquired together with its peers. Our approach is divided into two phases. In the \emph{collaborative phase}, all the agents in the population interact with a shared instance of the environment, where each agent observes the state and independently proposes an action. Then, voting is performed to choose the action that will actually be performed in the environment. In the \emph{individual phase}, each agent refines its individual performance by interacting with its own instance of the environment. This mechanism makes the agents experience a larger number of episodes while simultaneously reducing the computational cost of the process. Our results on six well-known benchmarks show that SIRL reaches state-of-the-art performance w.r.t. the alternative interpretable methods from the literature.
翻译:强化学习(RL)具有成为众多应用领域赋能技术的潜力。然而,由于该领域现有文献大多聚焦于不透明模型,RL在需要可解释性的高风险场景中的应用仍十分有限。近年来,已有基于决策树等方法的可解释RL研究被提出,但这些技术的主要局限之一在于其训练成本。为克服这一局限,受社会学习原理启发,我们提出了一种名为社会可解释强化学习(SIRL)的新型群体学习方法,旨在提升学习效率。该方法模仿社会学习过程:群体中的每个智能体既基于自身个体经验,也基于与同伴共同获得的经验来学习解决特定任务。我们的方法分为两个阶段。在**协作阶段**,群体中所有智能体与环境共享实例交互,每个智能体观察状态并独立提出动作建议,随后通过投票机制选出实际执行的动作。在**个体阶段**,每个智能体通过与其专属环境实例交互来优化个体性能。该机制使智能体能够经历更丰富的片段数量,同时降低计算成本。在六个知名基准测试上的结果表明,与文献中其他可解释方法相比,SIRL达到了当前最优性能水平。