With the impact of real-time processing being realized in the recent past, the need for efficient implementations of reinforcement learning algorithms has been on the rise. Albeit the numerous advantages of Bellman equations utilized in RL algorithms, they are not without the large search space of design parameters. This research aims to shed light on the design space exploration associated with reinforcement learning parameters, specifically that of Policy Iteration. Given the large computational expenses of fine-tuning the parameters of reinforcement learning algorithms, we propose an auto-tuner-based ordinal regression approach to accelerate the process of exploring these parameters and, in return, accelerate convergence towards an optimal policy. Our approach provides 1.82x peak speedup with an average of 1.48x speedup over the previous state-of-the-art.
翻译:随着近年来实时处理的重要性日益凸显,对强化学习算法高效实现的需求持续增长。尽管强化学习算法中使用的贝尔曼方程具有诸多优势,但仍面临设计参数搜索空间庞大的问题。本研究旨在揭示与强化学习参数相关的设计空间探索,特别是策略迭代方法的参数优化。针对强化学习算法参数微调计算开销巨大的问题,我们提出了一种基于自动调谐器的序数回归方法,以加速参数探索过程,进而加快最优策略的收敛速度。该方法相较于现有最优方案实现了1.82倍的峰值加速比和1.48倍的平均加速比。