Non-Euclidean data become more prevalent in practice, necessitating the development of a framework for statistical inference analogous to that for Euclidean data. Quantile is one of the most important concepts in traditional statistical inference; we introduce the counterpart, both locally and globally, for data objects in metric spaces. This is realized by expanding upon the metric distribution function proposed by Wang et al. (2021). Rank and sign are defined at local and global levels as a natural consequence of the center-outward ordering of metric spaces brought about by the local and global quantiles. The theoretical properties are established, such as the root-$n$ consistency and uniform consistency of the local and global empirical quantiles and the distribution-freeness of ranks and signs. The empirical metric median, which is defined here as the 0th empirical global metric quantile, is proven to be resistant to contamination by means of both theoretical and numerical approaches. Quantiles have been shown to be valuable through extensive simulations in a number of metric spaces. Moreover, we introduce a family of fast rank-based independence tests for a generic metric space. Monte Carlo experiments show good finite-sample performance of the test.
翻译:非欧几里得数据在实践中日益普遍,因此需要建立类似于欧几里得数据的统计推断框架。分位数是传统统计推断中最重要的概念之一;我们针对度量空间中的数据对象,从局部和全局两个层面引入其对应概念。这是通过扩展Wang等人(2021)提出的度量分布函数实现的。作为局部和全局分位数引发的度量空间中心向外排序的自然结果,我们在局部和全局层面定义了秩和符号。理论性质得以建立,例如局部和全局经验分位数的根号n一致性和一致一致性,以及秩和符号的分布自由性。定义为第0个全局经验度量中位数的经验度量中位数,通过理论和数值方法被证明具有抗污染性。通过在多种度量空间中的广泛模拟,分位数展现出其价值。此外,我们针对一般度量空间引入了一系列基于秩的快速独立性检验。蒙特卡洛实验表明该检验具有良好的有限样本表现。