We consider the Adversarial Multi-Armed Bandits (MAB) problem with unbounded losses, where the algorithms have no prior knowledge on the sizes of the losses. We present UMAB-NN and UMAB-G, two algorithms for non-negative and general unbounded loss respectively. For non-negative unbounded loss, UMAB-NN achieves the first adaptive and scale free regret bound without uniform exploration. Built up on that, we further develop UMAB-G that can learn from arbitrary unbounded loss. Our analysis reveals the asymmetry between positive and negative losses in the MAB problem and provide additional insights. We also accompany our theoretical findings with extensive empirical evaluations, showing that our algorithms consistently out-performs all existing algorithms that handles unbounded losses.
翻译:本文研究损失无上界的对抗式多臂老虎机(Adversarial Multi-Armed Bandits, MAB)问题,其中算法对损失大小无先验知识。我们提出UMAB-NN和UMAB-G两种算法,分别针对非负损失和一般无界损失。对于非负无界损失,UMAB-NN首次实现无需均匀探索的自适应且尺度无关的遗憾界。在此基础上,我们进一步开发了UMAB-G算法,可学习任意无界损失。我们的分析揭示了MAB问题中正负损失之间的不对称性,并提供了额外见解。我们还通过大量实证评估验证理论结果,表明我们的算法持续优于所有现有处理无界损失的算法。