We prove a fundamental limitation on the efficiency of a wide class of Reinforcement Learning (RL) algorithms. This limitation applies to model-free RL methods as well as a broad range of model-based methods, such as planning with tree search. Under an abstract definition of this class, we provide a family of RL problems for which these methods suffer a lower bound exponential in the horizon for their interactions with the environment to find an optimal behavior. However, there exists a method, not tailored to this specific family of problems, which can efficiently solve the problems in the family. In contrast, our limitation does not apply to several types of methods proposed in the literature, for instance, goal-conditioned methods or other algorithms that construct an inverse dynamics model.
翻译:我们证明了在广泛类别的强化学习(RL)算法中存在一个根本性的效率限制。该限制适用于无模型RL方法以及基于模型的多种方法(例如结合树搜索的规划方法)。在此类方法的抽象定义下,我们提供了一族RL问题,使得这些方法在寻找最优行为时与环境交互的次数面临随任务范围指数级增长的下界。然而,存在一种并非针对该特定问题族设计的方法,能够高效解决该族中的问题。相比之下,我们的限制并不适用于文献中提出的若干类型方法,例如目标条件方法或其他构建逆动力学模型的算法。