Singularity of Data Analytic Operations

Statistical data by their very nature are indeterminate in the sense that if one repeated the process of collecting the data the new data set would be somewhat different from the original. Therefore, a statistical method, a map $\Phi$ taking a data set $x$ to a point in some space F, should be stable at $x$: Small perturbations in $x$ should result in a small change in $\Phi(x)$. Otherwise, $\Phi$ is useless at $x$ or -- and this is important -- near $x$. So one doesn't want $\Phi$ to have "singularities," data sets $x$ such that the the limit of $\Phi(y)$ as $y$ approaches $x$ doesn't exist. (Yes, the same issue arises elsewhere in applied math.) However, broad classes of statistical methods have topological obstructions of continuity: They must have singularities. We show why and give lower bounds on the Hausdorff dimension, even Hausdorff measure, of the set of singularities of such data maps. There seem to be numerous examples. We apply mainly topological methods to study the (topological) singularities of functions defined (on dense subsets of) "data spaces" and taking values in spaces with nontrivial homology. At least in this book, data spaces are usually compact manifolds. The purpose is to gain insight into the numerical conditioning of statistical description, data summarization, and inference and learning methods. We prove general results that can often be used to bound below the dimension of the singular set. We apply our topological results to develop lower bounds on Hausdorff measure of the singular set. We apply these methods to the study of plane fitting and measuring location of data on spheres. This is not a "final" version, merely another attempt.

翻译：统计数据本质上是非确定性的，因为如果重复数据收集过程，新数据集与原始数据集会有所不同。因此，一种统计方法——将数据集$x$映射到某空间$F$中一个点的映射$\Phi$——应在$x$处保持稳定：$x$的微小扰动应导致$\Phi(x)$的微小变化。否则，$\Phi$在$x$处——且重要的是——在$x$附近将毫无用处。因此，我们不希望$\Phi$存在“奇异性”，即数据集$x$使得当$y$趋近于$x$时$\Phi(y)$的极限不存在。（是的，应用数学的其他领域也存在同样的问题。）然而，广泛的统计方法类别存在拓扑连续性的障碍：它们必然具有奇异性。我们阐释了其原因，并给出了此类数据映射的奇异集豪斯多夫维数甚至豪斯多夫测度的下界。似乎存在大量实例。我们主要运用拓扑方法来研究定义在“数据空间”稠密子集上、取值于具有非平凡同调性的空间中的函数的（拓扑）奇异性。至少在本著作中，数据空间通常是紧流形。我们的目标是深入理解统计描述、数据摘要、推理及学习方法的数值条件性。我们证明了可用于界定奇异集维数下界的一般性结论。利用拓扑结果，我们推导了奇异集豪斯多夫测度的下界。我们将这些方法应用于平面拟合和球面数据位置测量研究。这并非“最终”版本，仅为又一次尝试。