Barycenters (aka Fr\'echet means) were introduced in statistics in the 1940's and popularized in the fields of shape statistics and, later, in optimal transport and matrix analysis. They provide the most natural extension of linear averaging to non-Euclidean geometries, which is perhaps the most basic and widely used tool in data science. In various setups, their asymptotic properties, such as laws of large numbers and central limit theorems, have been established, but their non-asymptotic behaviour is still not well understood. In this work, we prove finite sample concentration inequalities (namely, generalizations of Hoeffding's and Bernstein's inequalities) for barycenters of i.i.d. random variables in metric spaces with non-positive curvature in Alexandrov's sense. As a byproduct, we also obtain PAC guarantees for a stochastic online algorithm that computes the barycenter of a finite collection of points in a non-positively curved space. We also discuss extensions of our results to spaces with possibly positive curvature.
翻译:重心(又称弗雷歇均值)于20世纪40年代被引入统计学,并在形状统计领域及随后的最优传输和矩阵分析中得以推广。它们为欧几里得几何以外的空间提供了线性平均最自然的扩展,这或许是数据科学中最基础且应用最广泛的工具。在多种设定下,其渐近性质(如大数定律和中心极限定理)已被建立,但非渐近行为仍未被充分理解。本文证明了亚历山大罗夫意义下非正曲率度量空间中独立同分布随机变量重心的有限样本集中不等式(即霍夫丁不等式和伯恩斯坦不等式的推广)。作为副产品,我们还为非正曲率空间中计算有限点集重心的随机在线算法提供了PAC保证。此外,我们讨论了将结果推广至可能具有正曲率空间的可能性。