Our main focus is on the generalization bound, which serves as an upper limit for the generalization error. Our analysis delves into regression and classification tasks separately to ensure a thorough examination. We assume the target function is real-valued and Lipschitz continuous for regression tasks. We use the 2-norm and a root-mean-square-error (RMSE) variant to measure the disparities between predictions and actual values. In the case of classification tasks, we treat the target function as a one-hot classifier, representing a piece-wise constant function, and employ 0/1 loss for error measurement. Our analysis underscores the differing sample complexity required to achieve a concentration inequality of generalization bounds, highlighting the variation in learning efficiency for regression and classification tasks. Furthermore, we demonstrate that the generalization bounds for regression and classification functions are inversely proportional to a polynomial of the number of parameters in a network, with the degree depending on the hypothesis class and the network architecture. These findings emphasize the advantages of over-parameterized networks and elucidate the conditions for benign overfitting in such systems.
翻译:本文主要关注泛化界,其作为泛化误差的上界。我们的分析分别深入探讨回归与分类任务,以确保全面考察。对于回归任务,我们假设目标函数为实值且满足Lipschitz连续性。我们采用2-范数及均方根误差(RMSE)的变体来衡量预测值与实际值之间的差异。对于分类任务,我们将目标函数视为独热分类器,即分段常数函数,并采用0/1损失进行误差度量。我们的分析强调了实现泛化界集中不等式所需样本复杂度的差异,揭示了回归与分类任务学习效率的变化。此外,我们证明了回归与分类函数的泛化界与网络参数数量的多项式成反比,其次数取决于假设类与网络架构。这些发现强调了过参数化网络的优势,并阐明了此类系统中良性过拟合的条件。