Policy inference plays an essential role in the contextual bandit problem. In this paper, we use empirical likelihood to develop a Bayesian inference method for the joint analysis of multiple contextual bandit policies in finite sample regimes. The proposed inference method is robust to small sample sizes and is able to provide accurate uncertainty measurements for policy value evaluation. In addition, it allows for flexible inferences on policy comparison with full uncertainty quantification. We demonstrate the effectiveness of the proposed inference method using Monte Carlo simulations and its application to an adolescent body mass index data set.
翻译:策略推断在上下文赌博机问题中具有重要作用。本文利用经验似然方法,发展了一种适用于有限样本场景下多策略联合分析的贝叶斯推断框架。所提出的推断方法对小样本数据具有鲁棒性,能够为策略价值评估提供精确的不确定性度量。此外,该方法支持在完全不确定性量化的条件下进行灵活的策略比较推断。我们通过蒙特卡洛模拟验证了该推断方法的有效性,并将其应用于青少年身体质量指数数据集的实证分析。