The efficacy of deep residual networks is fundamentally predicated on the identity shortcut connection. While this mechanism effectively mitigates the vanishing gradient problem, it imposes a strictly additive inductive bias on feature transformations, thereby limiting the network's capacity to model complex state transitions. In this paper, we introduce Deep Delta Learning (DDL), a novel architecture that generalizes the standard residual connection by modulating the identity shortcut with a learnable, data-dependent geometric transformation. This transformation, termed the Delta Operator, constitutes a rank-1 perturbation of the identity matrix, parameterized by a reflection direction vector $\mathbf{k}(\mathbf{X})$ and a gating scalar $β(\mathbf{X})$. We provide a spectral analysis of this operator, demonstrating that the gate $β(\mathbf{X})$ enables dynamic interpolation between identity mapping, orthogonal projection, and geometric reflection. Furthermore, we restructure the residual update as a synchronous rank-1 injection, where the gate acts as a dynamic step size governing both the erasure of old information and the writing of new features. This unification empowers the network to explicitly control the spectrum of its layer-wise transition operator, enabling the modeling of complex, non-monotonic dynamics while preserving the stable training characteristics of gated residual architectures.
翻译:深度残差网络的有效性从根本上依赖于恒等快捷连接。虽然该机制有效缓解了梯度消失问题,但它对特征变换施加了严格的加性归纳偏置,从而限制了网络建模复杂状态转移的能力。本文提出深度Delta学习(DDL),一种通过使用可学习的、数据依赖的几何变换来调制恒等快捷连接,从而推广标准残差连接的新型架构。该变换称为Delta算子,由反射方向向量$\mathbf{k}(\mathbf{X})$和门控标量$β(\mathbf{X})$参数化,构成单位矩阵的秩-1扰动。我们对该算子进行了谱分析,证明门控$β(\mathbf{X})$能够在恒等映射、正交投影和几何反射之间实现动态插值。此外,我们将残差更新重构为同步秩-1注入,其中门控作为动态步长,同时控制旧信息的擦除和新特征的写入。这种统一使网络能够显式控制其逐层转移算子的谱,从而在保持门控残差架构稳定训练特性的同时,实现对复杂的非单调动态的建模。