The Geometry of the Set of Equivalent Linear Neural Networks

We characterize the geometry and topology of the set of all weight vectors for which a linear neural network computes the same linear transformation $W$. This set of weight vectors is called the fiber of $W$ (under the matrix multiplication map), and it is embedded in the Euclidean weight space of all possible weight vectors. The fiber is an algebraic variety that is not necessarily a manifold. We describe a natural way to stratify the fiber--that is, to partition the algebraic variety into a finite set of manifolds of varying dimensions called strata. We call this set of strata the rank stratification. We derive the dimensions of these strata and the relationships by which they adjoin each other. Although the strata are disjoint, their closures are not. Our strata satisfy the frontier condition: if a stratum intersects the closure of another stratum, then the former stratum is a subset of the closure of the latter stratum. Each stratum is a manifold of class $C^\infty$ embedded in weight space, so it has a well-defined tangent space and normal space at every point (weight vector). We show how to determine the subspaces tangent to and normal to a specified stratum at a specified point on the stratum, and we construct elegant bases for those subspaces. To help achieve these goals, we first derive what we call a Fundamental Theorem of Linear Neural Networks, analogous to what Strang calls the Fundamental Theorem of Linear Algebra. We show how to decompose each layer of a linear neural network into a set of subspaces that show how information flows through the neural network. Each stratum of the fiber represents a different pattern by which information flows (or fails to flow) through the neural network. The topology of a stratum depends solely on this decomposition. So does its geometry, up to a linear transformation in weight space.

翻译：我们刻画了线性神经网络中所有使得网络计算相同线性变换$W$的权重向量集合的几何与拓扑特性。该权重向量集合被称为$W$的纤维（在矩阵乘法映射下），它嵌入在所有可能权重向量构成的欧几里得权重空间中。该纤维是一个未必光滑的代数簇。我们描述了一种自然的分层方法——即将该代数簇划分为有限个不同维数的流形（称为层）。我们将这组层称为秩分层。我们推导了这些层的维数以及它们之间的邻接关系。尽管各层互不相交，但它们的闭包并非如此。我们的分层满足前沿条件：若某层与另一层的闭包相交，则前者是后者闭包的子集。每个层是嵌入权重空间的$C^\infty$类流形，因此在每一点（权重向量）处具有唯一切空间和法空间。我们展示了如何确定指定层上指定点处的切子空间与法子空间，并为这些子空间构造了优雅的基。为实现这些目标，我们首先推导了所谓的线性神经网络基本定理（类似于Strang提出的线性代数基本定理）。我们展示了如何将线性神经网络的每一层分解为一组子空间，这些子空间揭示了信息在网络中的流动方式。纤维的每一层对应信息在神经网络中流动（或未能流动）的某种不同模式。层的拓扑完全取决于该分解，其几何性质（在权重空间线性变换意义下）亦如此。