Contextual bandits are a core technology for personalized mobile health interventions, where decision-making requires adapting to complex, non-linear user behaviors. While Thompson Sampling (TS) is a preferred strategy for these problems, its performance hinges on the quality of the underlying reward model. Standard linear models suffer from high bias, while neural network approaches are often brittle and difficult to tune in online settings. Conversely, tree ensembles dominate tabular data prediction but typically rely on heuristic uncertainty quantification, lacking a principled probabilistic basis for TS. We propose Bayesian Forest Thompson Sampling (BFTS), the first contextual bandit algorithm to integrate Bayesian Additive Regression Trees (BART), a fully probabilistic sum-of-trees model, directly into the exploration loop. We prove that BFTS is theoretically sound, deriving an information-theoretic Bayesian regret bound of $\tilde{O}(\sqrt{T})$. As a complementary result, we establish frequentist minimax optimality for a "feel-good" variant, confirming the structural suitability of BART priors for non-parametric bandits. Empirically, BFTS achieves state-of-the-art regret on tabular benchmarks with near-nominal uncertainty calibration. Furthermore, in an offline policy evaluation on the Drink Less micro-randomized trial, BFTS improves engagement rates by over 30% compared to the deployed policy, demonstrating its practical effectiveness for behavioral interventions.
翻译:情境赌博机是个性化移动健康干预的核心技术,其决策过程需要适应复杂、非线性的用户行为。虽然汤普森采样(TS)是解决此类问题的优选策略,但其性能取决于底层奖励模型的质量。标准线性模型存在高偏差问题,而神经网络方法在在线环境中往往脆弱且难以调优。相反,树集成模型在表格数据预测中占据主导地位,但通常依赖启发式不确定性量化,缺乏适用于TS的概率理论基础。我们提出贝叶斯森林汤普森采样(BFTS),这是首个将完全概率化的树求和模型——贝叶斯加性回归树(BART)直接集成到探索循环中的情境赌博机算法。我们证明BFTS在理论上是完备的,推导出其信息论贝叶斯遗憾界为$\tilde{O}(\sqrt{T})$。作为补充结果,我们为“感觉良好”变体建立了频率学派极小极大最优性,证实了BART先验对非参数赌博机的结构适用性。实证表明,BFTS在表格基准测试中实现了最先进的遗憾性能,且不确定性校准接近标称值。此外,在Drink Less微随机试验的离线策略评估中,BFTS相比已部署策略将参与率提升了30%以上,证明了其在行为干预实践中的有效性。