Deep neural networks (DNNs) are powerful tools for approximating the distribution of complex data. It is known that data passing through a trained DNN classifier undergoes a series of geometric and topological simplifications. While some progress has been made toward understanding these transformations in neural networks with smooth activation functions, an understanding in the more general setting of non-smooth activation functions, such as the rectified linear unit (ReLU), which tend to perform better, is required. Here we propose that the geometric transformations performed by DNNs during classification tasks have parallels to those expected under Hamilton's Ricci flow - a tool from differential geometry that evolves a manifold by smoothing its curvature, in order to identify its topology. To illustrate this idea, we present a computational framework to quantify the geometric changes that occur as data passes through successive layers of a DNN, and use this framework to motivate a notion of `global Ricci network flow' that can be used to assess a DNN's ability to disentangle complex data geometries to solve classification problems. By training more than $1,500$ DNN classifiers of different widths and depths on synthetic and real-world data, we show that the strength of global Ricci network flow-like behaviour correlates with accuracy for well-trained DNNs, independently of depth, width and data set. Our findings motivate the use of tools from differential and discrete geometry to the problem of explainability in deep learning.
翻译:深度神经网络(DNNs)是逼近复杂数据分布的有效工具。已知经过训练的DNN分类器处理的数据会经历一系列几何与拓扑简化过程。尽管针对具有光滑激活函数的神经网络中这些变换的理解已取得一定进展,但在表现更优的非光滑激活函数(如修正线性单元ReLU)的通用场景中仍需要深入探究。本文提出,DNN在执行分类任务时实施的几何变换与汉密尔顿里奇流(一种通过平滑曲率演化流形以识别拓扑的微分几何工具)预期的变换存在平行关系。为阐释这一观点,我们构建了一个计算框架来量化数据经过DNN各层时的几何变化,并据此提出"全局里奇网络流"概念,用于评估DNN解耦复杂数据几何以解决分类问题的能力。通过在合成数据和真实数据上训练超过1500个不同宽度与深度的DNN分类器,我们发现:对于训练良好的DNN,全局里奇网络流行为的强度与准确率存在相关性,且这种相关性独立于深度、宽度和数据集。本研究推促进采用微分与离散几何工具解决深度学习可解释性问题。