Many real-world bandit applications are characterized by sparse rewards, which can significantly hinder learning efficiency. Leveraging problem-specific structures for careful distribution modeling is recognized as essential for improving estimation efficiency in statistics. However, this approach remains under-explored in the context of bandits. To address this gap, we initiate the study of zero-inflated bandits, where the reward is modeled using a classic semi-parametric distribution known as the zero-inflated distribution. We develop algorithms based on the Upper Confidence Bound and Thompson Sampling frameworks for this specific structure. The superior empirical performance of these methods is demonstrated through extensive numerical studies.
翻译:许多现实世界中的赌博机应用具有奖励稀疏的特点,这会显著阻碍学习效率。在统计学中,利用问题特定的结构进行细致的分布建模被公认为是提高估计效率的关键。然而,在赌博机问题中,这种方法仍未得到充分探索。为填补这一空白,我们开创性地研究了零膨胀赌博机问题,其中奖励使用一种经典的半参数分布——零膨胀分布进行建模。我们针对这一特定结构,基于上置信界和汤普森采样框架开发了相应算法。通过大量的数值研究,我们证明了这些方法具有优越的实证性能。