We investigate the problem of bandits with expert advice when the experts are fixed and known distributions over the actions. Improving on previous analyses, we show that the regret in this setting is controlled by information-theoretic quantities that measure the similarity between experts. In some natural special cases, this allows us to obtain the first regret bound for EXP4 that can get arbitrarily close to zero if the experts are similar enough. While for a different algorithm, we provide another bound that describes the similarity between the experts in terms of the KL-divergence, and we show that this bound can be smaller than the one of EXP4 in some cases. Additionally, we provide lower bounds for certain classes of experts showing that the algorithms we analyzed are nearly optimal in some cases.
翻译:我们研究了当专家为固定且已知的行动分布时,具有专家建议的赌博机问题。在改进先前分析的基础上,我们证明在该设定下,遗憾受控于衡量专家之间相似度的信息论量值。在某些自然特例中,这使我们首次获得EXP4算法的遗憾界——当专家足够相似时,该界可任意趋近于零。针对另一种算法,我们提供了以KL散度描述专家相似度的新界,并证明在某些情况下该界可小于EXP4的界。此外,我们给出了特定专家类别下的下界,表明所分析的算法在某些情况下接近最优。