Understanding why gradient-based training in deep networks exhibits strong implicit bias remains challenging, in part because tractable singular-value dynamics are typically available only for balanced deep linear models. We propose an alternative route based on two theoretically grounded and empirically testable signatures of deep Jacobians: depth-induced exponential scaling of ordered singular values and strong spectral separation. Adopting a fixed-gates view of piecewise-linear networks, where Jacobians reduce to products of masked linear maps within a single activation region, we prove the existence of Lyapunov exponents governing the top singular values at initialization, give closed-form expressions in a tractable masked model, and quantify finite-depth corrections. We further show that sufficiently strong separation forces singular-vector alignment in matrix products, yielding an approximately shared singular basis for intermediate Jacobians. Together, these results motivate an approximation regime in which singular-value dynamics become effectively decoupled, mirroring classical balanced deep-linear analyses without requiring balancing. Experiments in fixed-gates settings validate the predicted scaling, alignment, and resulting dynamics, supporting a mechanistic account of emergent low-rank Jacobian structure as a driver of implicit bias.
翻译:理解深度网络中基于梯度的训练为何表现出强烈的隐式偏置仍然具有挑战性,部分原因在于可处理的奇异值动力学通常仅适用于平衡的深度线性模型。我们提出一种基于深度雅可比矩阵两个理论依据充分且可实证检验特征的替代路径:深度诱导的有序奇异值指数尺度化与强谱分离。采用分段线性网络的固定门控视角(其中雅可比矩阵在单个激活区域内简化为掩码线性映射的乘积),我们证明了初始化时主导奇异值存在李雅普诺夫指数,给出了一个可处理的掩码模型中的闭式表达式,并量化了有限深度修正。我们进一步证明,足够强的分离会迫使矩阵乘积中的奇异向量对齐,从而为中间雅可比矩阵生成近似共享的奇异基。综合这些结果,我们推导出一个近似机制,其中奇异值动力学实现有效解耦,这模拟了经典平衡深度线性分析而无需实际平衡操作。在固定门控设定下的实验验证了预测的尺度化、对齐及相应动力学,为解释作为隐式偏置驱动因素的新兴低秩雅可比结构提供了机制性依据。