We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be improved beyond the $\smash{\sqrt{KT}}$ dependence exhibited by existing algorithms. Our main contribution is in showing that the minimax regret of bandit multiclass is in fact more nuanced, and is of the form $\smash{\widetilde{\Theta}\left(\min \left\{|\mathcal{H}| + \sqrt{T}, \sqrt{KT \log |{\mathcal{H}|}} \right\} \right) }$, where $\mathcal{H}$ is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret $\smash{\widetilde{O}(|\mathcal{H}|+\sqrt{T})}$, improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.
翻译:我们重新审视了具有赌博机反馈的多类分类经典问题(Kakade, Shalev-Shwartz 和 Tewari, 2008),其中每个输入被分类为 $K$ 个可能标签之一,且反馈仅限于预测标签是否正确。我们的主要研究关注于对标签数 $K$ 的依赖关系,以及在此设定下能否将 $T$ 步遗憾界改进至超越现有算法所依赖的 $\smash{\sqrt{KT}}$ 依赖。我们的主要贡献在于表明:赌博机多类问题的极小化最大遗憾实际上更为微妙,其形式为 $\smash{\widetilde{\Theta}\left(\min \left\{|\mathcal{H}| + \sqrt{T}, \sqrt{KT \log |{\mathcal{H}|}} \right\} \right) }$,其中 $\mathcal{H}$ 是底层(有限)假设类。特别地,我们提出了一种新的赌博机分类算法,该算法能保证遗憾为 $\smash{\widetilde{O}(|\mathcal{H}|+\sqrt{T})}$,对于中等规模的假设类而言优于经典算法;同时我们给出了一个匹配的下界,表明在所有参数区间内上界(至多相差对数因子)的紧致性。