Neural networks (NN) achieve remarkable results in various tasks, but lack key characteristics: interpretability, support for categorical features, and lightweight implementations suitable for edge devices. While ongoing efforts aim to address these challenges, Gradient Boosting Trees (GBT) inherently meet these requirements. As a result, GBTs have become the go-to method for supervised learning tasks in many real-world applications and competitions. However, their application in online learning scenarios, notably in reinforcement learning (RL), has been limited. In this work, we bridge this gap by introducing Gradient-Boosting RL (GBRL), a framework that extends the advantages of GBT to the RL domain. Using the GBRL framework, we implement various actor-critic algorithms and compare their performance with their NN counterparts. Inspired by shared backbones in NN we introduce a tree-sharing approach for policy and value functions with distinct learning rates, enhancing learning efficiency over millions of interactions. GBRL achieves competitive performance across a diverse array of tasks, excelling in domains with structured or categorical features. Additionally, we present a high-performance, GPU-accelerated implementation that integrates seamlessly with widely-used RL libraries (available at https://github.com/NVlabs/gbrl). GBRL expands the toolkit for RL practitioners, demonstrating the viability and promise of GBT within the RL paradigm, particularly in domains characterized by structured or categorical features.
翻译:神经网络(NN)在各种任务中取得了显著成果,但缺乏关键特性:可解释性、对类别特征的支持以及适用于边缘设备的轻量级实现。尽管持续的研究努力旨在应对这些挑战,梯度提升树(GBT)天生满足这些要求。因此,GBT已成为许多现实世界应用和竞赛中监督学习任务的首选方法。然而,其在在线学习场景中的应用,特别是在强化学习(RL)中,一直较为有限。在本工作中,我们通过引入梯度提升强化学习(GBRL)框架来弥合这一差距,该框架将GBT的优势扩展到RL领域。利用GBRL框架,我们实现了多种行动者-评论者算法,并将其性能与对应的NN版本进行比较。受NN中共享骨干网络的启发,我们引入了具有不同学习率的策略函数与价值函数间的树共享方法,从而在数百万次交互中提升了学习效率。GBRL在多样化的任务中实现了有竞争力的性能,尤其在具有结构化或类别特征的领域中表现出色。此外,我们提出了一个高性能、GPU加速的实现,可与广泛使用的RL库无缝集成(代码发布于https://github.com/NVlabs/gbrl)。GBRL扩展了RL从业者的工具集,证明了GBT在RL范式中的可行性与前景,特别是在以结构化或类别特征为特点的领域中。