Model-Free Reinforcement Learning (RL) algorithms either learn how to map states to expected rewards or search for policies that can maximize a certain performance function. Model-Based algorithms instead, aim to learn an approximation of the underlying model of the RL environment and then use it in combination with planning algorithms. Upside-Down Reinforcement Learning (UDRL) is a novel learning paradigm that aims to learn how to predict actions from states and desired commands. This task is formulated as a Supervised Learning problem and has successfully been tackled by Neural Networks (NNs). In this paper, we investigate whether function approximation algorithms other than NNs can also be used within a UDRL framework. Our experiments, performed over several popular optimal control benchmarks, show that tree-based methods like Random Forests and Extremely Randomized Trees can perform just as well as NNs with the significant benefit of resulting in policies that are inherently more interpretable than NNs, therefore paving the way for more transparent, safe, and robust RL.
翻译:无模型强化学习(RL)算法通常学习如何将状态映射至预期奖励,或搜索能够最大化特定性能函数的策略。而基于模型的算法则旨在学习对RL环境底层模型的近似,随后将其与规划算法结合使用。逆向强化学习(UDRL)是一种新颖的学习范式,其目标是学习如何根据状态与期望指令来预测动作。该任务被形式化为监督学习问题,并已成功通过神经网络(NNs)实现。本文探讨了除NNs之外的其他函数逼近算法是否也可用于UDRL框架。我们在多个经典最优控制基准测试上进行的实验表明,随机森林和极端随机树等基于树的方法能够取得与NNs相当的性能,且具有显著优势:其生成的策略本质上比NNs更具可解释性,从而为构建更透明、安全、鲁棒的RL系统开辟了新路径。