We prove that single-parameter natural exponential families with subexponential tails are self-concordant with polynomial-sized parameters. For subgaussian natural exponential families we establish an exact characterization of the growth rate of the self-concordance parameter. Applying these findings to bandits allows us to fill gaps in the literature: We show that optimistic algorithms for generalized linear bandits enjoy regret bounds that are both second-order (scale with the variance of the optimal arm's reward distribution) and free of an exponential dependence on the bound of the problem parameter in the leading term. To the best of our knowledge, ours is the first regret bound for generalized linear bandits with subexponential tails, broadening the class of problems to include Poisson, exponential and gamma bandits.
翻译:我们证明了具有次指数尾部的单参数自然指数族具有多项式规模参数的自协调性。对于次高斯自然指数族,我们建立了自协调参数增长率的精确刻画。将这些发现应用于赌博机问题,使我们能够填补文献中的空白:我们证明了广义线性赌博机的乐观算法享有既具有二阶性质(与最优臂奖励分布的方差成比例)又在主导项中不含问题参数边界指数依赖的遗憾界。据我们所知,我们的研究首次为具有次指数尾部的广义线性赌博机提供了遗憾界,从而将问题类别扩展至包括泊松、指数和伽马赌博机。