Obtaining guarantees on the convergence of the minimizers of empirical risks to the ones of the true risk is a fundamental matter in statistical learning. Instead of deriving guarantees on the usual estimation error, the goal of this paper is to provide concentration inequalities on the distance between the sets of minimizers of the risks for a broad spectrum of estimation problems. In particular, the risks are defined on metric spaces through probability measures that are also supported on metric spaces. A particular attention will therefore be given to include unbounded spaces and non-convex cost functions that might also be unbounded. This work identifies a set of assumptions allowing to describe a regime that seem to govern the concentration in many estimation problems, where the empirical minimizers are stable. This stability can then be leveraged to prove parametric concentration rates in probability and in expectation. The assumptions are verified, and the bounds showcased, on a selection of estimation problems such as barycenters on metric space with positive or negative curvature, subspaces of covariance matrices, regression problems and entropic-Wasserstein barycenters.
翻译:获得经验风险最小化者收敛于真实风险最小化者的保证,是统计学习中的一个基本问题。本文的目标并非推导通常估计误差的保证,而是为广泛估计问题中风险最小化者集合之间的距离提供集中不等式。特别地,风险定义在度量空间上,通过同样支撑在度量空间上的概率测度来定义。因此,将特别关注包含无界空间以及可能也无界的非凸代价函数的情况。这项工作识别了一组假设,这些假设能够描述在许多估计问题中支配集中性的一个机制,其中经验最小化者是稳定的。这种稳定性随后可被利用,以证明概率和期望中的参数化集中速率。本文在一系列估计问题(例如正曲率或负曲率度量空间上的重心、协方差矩阵的子空间、回归问题以及熵-瓦瑟斯坦重心)上验证了这些假设并展示了所获得的界。