A recent line of works, initiated by Russo and Xu, has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of $O(\sqrt{\lambda/n})$ where $\lambda$ is some information-theoretic quantities such as the mutual information or conditional mutual information between the data and the learned hypothesis. However, such a learning rate is typically considered to be ``slow", compared to a ``fast rate" of $O(\lambda/n)$ in many learning scenarios. In this work, we first show that the square root does not necessarily imply a slow rate, and a fast rate result can still be obtained using this bound under appropriate assumptions. Furthermore, we identify the critical conditions needed for the fast rate generalization error, which we call the $(\eta,c)$-central condition. Under this condition, we give information-theoretic bounds on the generalization error and excess risk, with a fast convergence rate for specific learning algorithms such as empirical risk minimization and its regularized version. Finally, several analytical examples are given to show the effectiveness of the bounds.
翻译:由Russo和Xu开创的一系列近期研究表明,学习算法的泛化误差可以通过信息度量来上界。在大多数相关工作中,期望泛化误差的收敛速率为$O(\sqrt{\lambda/n})$的形式,其中$\lambda$是某些信息论量,如数据与学习到的假设之间的互信息或条件互信息。然而,在许多学习场景中,这种学习率通常被认为是“慢速”的,相比之下$O(\lambda/n)$则是“快速率”。在本文中,我们首先说明平方根并不一定意味着慢速率,并且在该界下通过适当的假设仍可获得快速率结果。此外,我们识别出快速率泛化误差所需的关键条件,称为$(\eta,c)$-中心条件。在此条件下,我们给出了泛化误差和超额风险的信息论界,针对特定学习算法(如经验风险最小化及其正则化版本)实现了快速收敛率。最后,通过若干分析示例展示了这些界的有效性。