Data sets sampled in Lie groups are widespread, and as with multivariate data, it is important for many applications to assess the differences between the sets in terms of their distributions. Indices for this task are usually derived by considering the Lie group as a Riemannian manifold. Then, however, compatibility with the group operation is guaranteed only if a bi-invariant metric exists, which is not the case for most non-compact and non-commutative groups. We show here that if one considers an affine connection structure instead, one obtains bi-invariant generalizations of well-known dissimilarity measures: a Hotelling $T^2$ statistic, Bhattacharyya distance and Hellinger distance. Each of the dissimilarity measures matches its multivariate counterpart for Euclidean data and is translation-invariant, so that biases, e.g., through an arbitrary choice of reference, are avoided. We further derive non-parametric two-sample tests that are bi-invariant and consistent. We demonstrate the potential of these dissimilarity measures by performing group tests on data of knee configurations and epidemiological shape data. Significant differences are revealed in both cases.
翻译:李群中采样的数据集广泛存在,与多元数据类似,在众多应用中需要评估这些集合在分布上的差异。该任务通常通过将李群视为黎曼流形来导出指标。然而,这种方法仅在存在双不变度量时才能保证与群运算的兼容性,而对于大多数非紧致、非交换群而言,这一条件并不满足。本文表明,如果采用仿射联络结构,可以得到经典相异度量的双不变推广:Hotelling $T^2$统计量、Bhattacharyya距离和Hellinger距离。每个相异度量都与其在欧几里得数据上的多元形式匹配,且具有平移不变性,从而避免了例如因任意选择参考系而产生的偏差。我们进一步推导出双不变且一致的非参数双样本检验方法。通过对膝关节配置数据和流行病学形状数据进行群组检验,我们展示了这些相异度量的应用潜力,并在两个案例中均揭示了显著差异。