Classical statistical theory has been developed under the assumption that the data belongs to a linear space. However, in many applications the intrinsic geometry of the data is more intricate. Neglecting this frequently yields suboptimal or outright unuseable results, i.e., taking the pixel-wise average of images typically results in noise. Incorporating the intrinsic geometry of a dataset into statistical analysis is a highly non-trivial task. In fact different underlying geometries necessitate different approaches, and allow for results of varying strength. Perhaps the most common non-linear geometries appearing in statistical applications are metric spaces of non-positive curvature, such as the manifold of symmetric, positive (semi-)definite matrices. In this paper we introduce a (strong) law of large numbers for independent, but not necessarily identically distributed random variables taking values in complete spaces of non-positive curvature. Using this law of large numbers, we justify a stochastic approximation scheme for the limit of Fr\'echet means on such spaces. Apart from rendering the problem of computing Fr\'echet means computationally more tractable, the structure of this scheme suggests, that averaging operations on Hadamard spaces are more stable than previous results would suggest.
翻译:经典统计理论是在数据属于线性空间的假设下发展的。然而,在许多应用中,数据的内在几何结构更为复杂。忽略这一事实往往会导致结果次优甚至完全不可用,例如对图像进行逐像素平均通常会产生噪声。将数据集的内在几何结构纳入统计分析是一项高度非平凡的任务。事实上,不同的底层几何结构需要不同的方法,并允许产生不同强度的结果。统计应用中最常见的非线性几何结构或许是非正曲率度量空间,例如对称正(半)定矩阵流形。本文引入了一个(强)大数定律,适用于取值于完备非正曲率空间中的独立但未必同分布的随机变量。利用该大数定律,我们论证了此类空间上Fr\'echet均值的随机逼近方案。除了使计算Fr\'echet均值问题在计算上更易处理外,该方案的结构还表明,哈达玛空间上的平均运算比以往结果所暗示的更为稳定。