In this work, we improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs. The best known upper bound for this problem is $\mathcal{O}\bigl(\sqrt{\alpha T\ln K}\bigr)$, where $K$ is the number of actions, $\alpha$ is the independence number of the graph, and $T$ is the time horizon. The $\sqrt{\ln K}$ factor is known to be necessary when $\alpha = 1$ (the experts case). On the other hand, when $\alpha = K$ (the bandits case), the minimax rate is known to be $\Theta\bigl(\sqrt{KT}\bigr)$, and a lower bound $\Omega\bigl(\sqrt{\alpha T}\bigr)$ is known to hold for any $\alpha$. Our improved upper bound $\mathcal{O}\bigl(\sqrt{\alpha T(1+\ln(K/\alpha))}\bigr)$ holds for any $\alpha$ and matches the lower bounds for bandits and experts, while interpolating intermediate cases. To prove this result, we use FTRL with $q$-Tsallis entropy for a carefully chosen value of $q \in [1/2, 1)$ that varies with $\alpha$. The analysis of this algorithm requires a new bound on the variance term in the regret. We also show how to extend our techniques to time-varying graphs, without requiring prior knowledge of their independence numbers. Our upper bound is complemented by an improved $\Omega\bigl(\sqrt{\alpha T(\ln K)/(\ln\alpha)}\bigr)$ lower bound for all $\alpha > 1$, whose analysis relies on a novel reduction to multitask learning. This shows that a logarithmic factor is necessary as soon as $\alpha < K$.
翻译:在这项工作中,我们改进了关于具有强可观测无向反馈图的在线学习遗憾的上界和下界。该问题已知的最佳上界为 $\mathcal{O}\bigl(\sqrt{\alpha T\ln K}\bigr)$,其中 $K$ 是动作数量,$\alpha$ 是图的独立数,$T$ 是时间范围。当 $\alpha = 1$(专家情形)时,$\sqrt{\ln K}$ 因子被认为是必要的。另一方面,当 $\alpha = K$(赌博机情形)时,极小化极大速率已知为 $\Theta\bigl(\sqrt{KT}\bigr)$,且对于任意 $\alpha$,已知存在下界 $\Omega\bigl(\sqrt{\alpha T}\bigr)$。我们改进后的上界 $\mathcal{O}\bigl(\sqrt{\alpha T(1+\ln(K/\alpha))}\bigr)$ 适用于任意 $\alpha$,并匹配了赌博机和专家情形下的下界,同时插值了中间情形。为了证明这一结果,我们使用了带有 $q$-Tsallis 熵的 FTRL 方法,其中 $q \in [1/2, 1)$ 的值根据 $\alpha$ 精心选取。该算法的分析需要对遗憾中的方差项进行新的界定。我们还展示了如何将我们的技术推广到时变图,且无需预先知道其独立数。我们的上界由一个改进的 $\Omega\bigl(\sqrt{\alpha T(\ln K)/(\ln\alpha)}\bigr)$ 下界(对所有 $\alpha > 1$)补充,其分析依赖于一种新颖的归约到多任务学习的方法。这表明,一旦 $\alpha < K$,对数因子就是必要的。