The problem of estimating, from a random sample of points, the dimension of a compact subset S of the Euclidean space is considered. The emphasis is put on consistency results in the statistical sense. That is, statements of convergence to the true dimension value when the sample size grows to infinity. Among the many available definitions of dimension, we have focused (on the grounds of its statistical tractability) on three notions: the Minkowski dimension, the correlation dimension and the, perhaps less popular, concept of pointwise dimension. We prove the statistical consistency of some natural estimators of these quantities. Our proofs partially rely on the use of an instrumental estimator formulated in terms of the empirical volume function Vn (r), defined as the Lebesgue measure of the set of points whose distance to the sample is at most r. In particular, we explore the case in which the true volume function V (r) of the target set S is a polynomial on some interval starting at zero. An empirical study is also included. Our study aims to provide some theoretical support, and some practical insights, for the problem of deciding whether or not the set S has a dimension smaller than that of the ambient space. This is a major statistical motivation of the dimension studies, in connection with the so-called Manifold Hypothesis.
翻译:本文考虑从点的随机样本估计欧几里得空间中紧子集S的维数问题。重点在于统计意义上的相合性结果,即当样本量趋于无穷时,估计量收敛于真实维数值的论述。在众多可用的维数定义中,我们(基于其统计易处理性)聚焦于三个概念:闵可夫斯基维数、关联维数以及可能较少被提及的点态维数概念。我们证明了这些量的若干自然估计量的统计相合性。我们的证明部分依赖于一个工具性估计量的使用,该估计量通过经验体积函数Vn(r)表述,该函数定义为到样本点距离不超过r的点集的勒贝格测度。特别地,我们探讨了目标集S的真实体积函数V(r)在始于零点的某个区间上为多项式的情形。本文亦包含一项实证研究。我们的研究旨在为判断集合S的维数是否小于其所在环境空间的维数这一问题,提供一定的理论支持和实践见解。这是维数研究的一个主要统计动机,与所谓的流形假设相关联。