Hyperparameter optimization plays a key role in the machine learning domain. Its significance is especially pronounced in reinforcement learning (RL), where agents continuously interact with and adapt to their environments, requiring dynamic adjustments in their learning trajectories. To cater to this dynamicity, the Population-Based Training (PBT) was introduced, leveraging the collective intelligence of a population of agents learning simultaneously. However, PBT tends to favor high-performing agents, potentially neglecting the explorative potential of agents on the brink of significant advancements. To mitigate the limitations of PBT, we present the Generalized Population-Based Training (GPBT), a refined framework designed for enhanced granularity and flexibility in hyperparameter adaptation. Complementing GPBT, we further introduce Pairwise Learning (PL). Instead of merely focusing on elite agents, PL employs a comprehensive pairwise strategy to identify performance differentials and provide holistic guidance to underperforming agents. By integrating the capabilities of GPBT and PL, our approach significantly improves upon traditional PBT in terms of adaptability and computational efficiency. Rigorous empirical evaluations across a range of RL benchmarks confirm that our approach consistently outperforms not only the conventional PBT but also its Bayesian-optimized variant.
翻译:超参数优化在机器学习领域中扮演着关键角色,尤其在强化学习(RL)中更为显著。在强化学习中,智能体与环境持续交互并适应环境,其学习轨迹需要动态调整。为满足这种动态性,研究者提出了基于种群的训练(Population-Based Training, PBT),该方法利用同时学习的智能体种群的整体智能。然而,PBT倾向于青睐高性能智能体,可能忽略了那些处于重大突破边缘的智能体的探索潜力。为克服PBT的局限性,我们提出了广义种群训练(Generalized Population-Based Training, GPBT),这是一个经过优化的框架,旨在实现更精细、更灵活的超参数自适应。作为GPBT的补充,我们进一步引入了成对学习(Pairwise Learning, PL)。PL并非仅聚焦于精英智能体,而是采用全面的成对策略来识别性能差异,并为表现不佳的智能体提供全局指导。通过整合GPBT与PL的能力,我们的方法在适应性和计算效率上显著优于传统PBT。在多个强化学习基准上进行严格实证评估表明,我们的方法不仅持续超越传统PBT,还优于其贝叶斯优化变体。