We study how the topology of feature embedding space changes as it passes through the layers of a well-trained deep neural network (DNN) through Betti numbers. Motivated by existing studies using simplicial complexes on shallow fully connected networks (FCN), we present an extended analysis using Cubical homology instead, with a variety of popular deep architectures and real image datasets. We demonstrate that as depth increases, a topologically complicated dataset is transformed into a simple one, resulting in Betti numbers attaining their lowest possible value. The rate of decay in topological complexity (as a metric) helps quantify the impact of architectural choices on the generalization ability. Interestingly from a representation learning perspective, we highlight several invariances such as topological invariance of (1) an architecture on similar datasets; (2) embedding space of a dataset for architectures of variable depth; (3) embedding space to input resolution/size, and (4) data sub-sampling. In order to further demonstrate the link between expressivity \& the generalization capability of a network, we consider the task of ranking pre-trained models for downstream classification task (transfer learning). Compared to existing approaches, the proposed metric has a better correlation to the actually achievable accuracy via fine-tuning the pre-trained model.
翻译:我们通过贝蒂数研究特征嵌入空间拓扑结构在训练良好的深度神经网络逐层变化中的规律。受现有基于单纯复形对浅层全连接网络研究的启发,我们提出采用立方体同调理论扩展分析框架,并将其应用于多种主流深度架构和真实图像数据集。研究表明,随着网络层数增加,具有复杂拓扑特性的数据集会逐步简化为简单形态,最终使贝蒂数收敛至最小可能值。这种拓扑复杂度的衰减速率可作为度量指标,量化架构设计选择对模型泛化能力的影响。从表示学习视角出发,我们揭示了多种拓扑不变性:(1)同类数据集在不同架构下的拓扑不变性;(2)不同深度架构中数据集嵌入空间的拓扑不变性;(3)嵌入空间对输入分辨率/尺寸的拓扑不变性;(4)数据子采样的拓扑不变性。为进一步论证模型表达能力与泛化能力的关联,我们以预训练模型对下游分类任务的迁移学习排序为实验任务。与现有方法相比,所提指标与通过微调预训练模型实际可达到准确率的相关系数更高。