We propose a new framework for contextual multi-armed bandits based on tree ensembles. Our framework adapts two widely used bandit methods, Upper Confidence Bound and Thompson Sampling, for both standard and combinatorial settings. As part of this framework, we propose a novel method of estimating the uncertainty in tree ensemble predictions. We further demonstrate the effectiveness of our framework via several experimental studies, employing XGBoost and random forests, two popular tree ensemble methods. Compared to state-of-the-art methods based on decision trees and neural networks, our methods exhibit superior performance in terms of both regret minimization and computational runtime, when applied to benchmark datasets and the real-world application of navigation over road networks.
翻译:我们提出了一种基于树集成的情境多臂赌博机新框架。该框架将两种广泛使用的赌博机方法——上置信界算法和汤普森采样——适配于标准及组合设定。作为框架的一部分,我们提出了一种新颖的树集成预测不确定性估计方法。通过多项实验研究,我们进一步验证了该框架的有效性,这些研究采用了两种流行的树集成方法:XGBoost和随机森林。当应用于基准数据集及道路网络导航这一实际应用时,与基于决策树和神经网络的最先进方法相比,我们的方法在遗憾最小化和计算运行时间方面均表现出更优的性能。