In this paper, we study the Karush-Kuhn-Tucker (KKT) points of the associated maximum-margin problem in homogeneous neural networks, including fully-connected and convolutional neural networks. In particular, We investigates the relationship between such KKT points across networks of different widths generated. We introduce and formalize the \textbf{KKT point embedding principle}, establishing that KKT points of a homogeneous network's max-margin problem ($P_{\Phi}$) can be embedded into the KKT points of a larger network's problem ($P_{\tilde{\Phi}}$) via specific linear isometric transformations. We rigorously prove this principle holds for neuron splitting in fully-connected networks and channel splitting in convolutional neural networks. Furthermore, we connect this static embedding to the dynamics of gradient flow training with smooth losses. We demonstrate that trajectories initiated from appropriately mapped points remain mapped throughout training and that the resulting $\omega$-limit sets of directions are correspondingly mapped, thereby preserving the alignment with KKT directions dynamically when directional convergence occurs. We conduct several experiments to justify that trajectories are preserved. Our findings offer insights into the effects of network width, parameter redundancy, and the structural connections between solutions found via optimization in homogeneous networks of varying sizes.
翻译:本文研究了齐次神经网络(包括全连接网络与卷积神经网络)中相关最大间隔问题的Karush-Kuhn-Tucker (KKT) 点。特别地,我们探究了在不同宽度网络生成的此类KKT点之间的关系。我们提出并形式化了**KKT点嵌入原理**,该原理表明:齐次网络最大间隔问题($P_{\Phi}$)的KKT点可通过特定的线性等距变换,嵌入到更宽网络对应问题($P_{\tilde{\Phi}}$)的KKT点中。我们严格证明了该原理在全连接网络的神经元分裂与卷积神经网络的通道分裂情形下均成立。进一步地,我们将此静态嵌入与光滑损失函数下梯度流训练的动力学联系起来。我们证明了从适当映射的初始点出发的训练轨迹在整个训练过程中保持映射关系,并且所得方向性$\omega$-极限集也相应地被映射,从而在方向收敛发生时动态地保持与KKT方向的对齐。我们通过多组实验验证了轨迹保持性。本研究为理解网络宽度的影响、参数冗余性以及不同规模齐次神经网络中通过优化所得解之间的结构关联提供了新的见解。