Obtaining guarantees on the convergence of the minimizers of empirical risks to the ones of the true risk is a fundamental matter in statistical learning. Instead of deriving guarantees on the usual estimation error, the goal of this paper is to provide concentration inequalities on the distance between the sets of minimizers of the risks for a broad spectrum of estimation problems. In particular, the risks are defined on metric spaces through probability measures that are also supported on metric spaces. A particular attention will therefore be given to include unbounded spaces and non-convex cost functions that might also be unbounded. This work identifies a set of assumptions allowing to describe a regime that seem to govern the concentration in many estimation problems, where the empirical minimizers are stable. This stability can then be leveraged to prove parametric concentration rates in probability and in expectation. The assumptions are verified, and the bounds showcased, on a selection of estimation problems such as barycenters on metric space with positive or negative curvature, subspaces of covariance matrices, regression problems and entropic-Wasserstein barycenters.
翻译:获得经验风险最小化器收敛于真实风险最小化器的保证是统计学习中的基本问题。本文的目的不是推导通常的估计误差保证,而是为广泛估计问题中风险最小化器集合之间的距离提供集中不等式。特别地,风险通过度量空间上的概率测度定义,而这些概率测度也支撑在度量空间上。因此,需要特别关注包含无界空间以及可能无界的非凸代价函数。本研究识别了一组假设,这些假设允许描述一种似乎在许多估计问题中主导集中性的机制,其中经验最小化器是稳定的。这种稳定性随后可用于证明概率和期望中的参数化集中率。本文在一系列估计问题上验证了这些假设并展示了界限,例如具有正曲率或负曲率的度量空间上的重心、协方差矩阵的子空间、回归问题以及熵-瓦瑟斯坦重心。