In this paper we present an approach to determine the smallest possible number of neurons in a layer of a neural network in such a way that the topology of the input space can be learned sufficiently well. We introduce a general procedure based on persistent homology to investigate topological invariants of the manifold on which we suspect the data set. We specify the required dimensions precisely, assuming that there is a smooth manifold on or near which the data are located. Furthermore, we require that this space is connected and has a commutative group structure in the mathematical sense. These assumptions allow us to derive a decomposition of the underlying space whose topology is well known. We use the representatives of the $k$-dimensional homology groups from the persistence landscape to determine an integer dimension for this decomposition. This number is the dimension of the embedding that is capable of capturing the topology of the data manifold. We derive the theory and validate it experimentally on toy data sets.
翻译:本文提出了一种方法,用于确定神经网络层中神经元的最小可能数量,从而充分学习输入空间的拓扑结构。我们引入了一个基于持续同调的一般性流程,以研究我们认为数据集所在流形的拓扑不变量。在假设数据位于或附近存在光滑流形的前提下,我们精确指定了所需维度。此外,我们要求该空间是连通的,并且在数学意义上具有交换群结构。这些假设使我们能够对具有已知拓扑的底层空间进行分解。我们利用持久景观中$k$维同调群的代表元来确定该分解的整数维度。该维度即为能够捕捉数据流形拓扑的嵌入维度。我们推导了相关理论,并在模拟数据集上进行了实验验证。