Distribution function is essential in statistical inference, and connected with samples to form a directed closed loop by the correspondence theorem in measure theory and the Glivenko-Cantelli and Donsker properties. This connection creates a paradigm for statistical inference. However, existing distribution functions are defined in Euclidean spaces and no longer convenient to use in rapidly evolving data objects of complex nature. It is imperative to develop the concept of distribution function in a more general space to meet emerging needs. Note that the linearity allows us to use hypercubes to define the distribution function in a Euclidean space, but without the linearity in a metric space, we must work with the metric to investigate the probability measure. We introduce a class of metric distribution functions through the metric between random objects and a fixed location in metric spaces. We overcome this challenging step by proving the correspondence theorem and the Glivenko-Cantelli theorem for metric distribution functions in metric spaces that lie the foundation for conducting rational statistical inference for metric space-valued data. Then, we develop homogeneity test and mutual independence test for non-Euclidean random objects, and present comprehensive empirical evidence to support the performance of our proposed methods.
翻译:分布函数在统计推断中至关重要,其通过测度论中的对应定理、Glivenko-Cantelli性质及Donsker性质与样本连接形成有向闭环。这种连接构建了统计推断的基本范式。然而,现有分布函数均定义在欧氏空间中,难以适用于快速发展的复杂结构数据对象。有必要在更广义的空间中发展分布函数概念以满足新兴需求。值得注意的是,欧氏空间中的线性性质使我们能够利用超立方体定义分布函数,但度量空间缺乏线性结构,必须借助度量研究概率测度。本文通过度量空间中随机对象与固定位置之间的度量引入了一类度量分布函数。通过证明度量空间中度量分布函数的对应定理和Glivenko-Cantelli定理,我们攻克了这一关键步骤,为度量空间值数据的合理统计推断奠定了理论基础。进而针对非欧几里得随机对象开发了同质性检验与互独立性检验方法,并通过全面的实证证据验证了所提方法的性能。