In this paper, our aim is to analyse the generalization capabilities of first-order methods for statistical learning in multiple, different yet related, scenarios including supervised learning, transfer learning, robust learning and federated learning. To do so, we provide sharp upper and lower bounds for the minimax excess risk of strongly convex and smooth statistical learning when the gradient is accessed through partial observations given by a data-dependent oracle. This novel class of oracles can query the gradient with any given data distribution, and is thus well suited to scenarios in which the training data distribution does not match the target (or test) distribution. In particular, our upper and lower bounds are proportional to the smallest mean square error achievable by gradient estimators, thus allowing us to easily derive multiple sharp bounds in the aforementioned scenarios using the extensive literature on parameter estimation.
翻译:本文旨在分析一阶方法在多种不同但相关的统计学习场景中的泛化能力,包括监督学习、迁移学习、鲁棒学习和联邦学习。为此,我们针对强凸且光滑的统计学习问题,在梯度通过数据相关预言机提供的部分观测值进行访问时,给出了极小化超额风险的尖锐上界和下界。这类新型预言机能够以任意给定的数据分布查询梯度,因此非常适用于训练数据分布与目标(或测试)分布不匹配的场景。特别地,我们的上界和下界与梯度估计器可达到的最小均方误差成正比,从而能够利用参数估计领域的丰富文献,轻松推导出上述多种场景下的尖锐界。