Weight initialization remains decisive for neural network optimization, yet existing methods are largely layer-agnostic. We study initialization for deeply-supervised architectures with auxiliary classifiers, where untrained auxiliary heads can destabilize early training through gradient interference. We propose LION-DG, a layer-informed initialization that zero-initializes auxiliary classifier heads while applying standard He-initialization to the backbone. We prove that this implements Gradient Awakening: auxiliary gradients are exactly zero at initialization, then phase in naturally as weights grow -- providing an implicit warmup without hyperparameters. Experiments on CIFAR-10 and CIFAR-100 with DenseNet-DS and ResNet-DS architectures demonstrate: (1) DenseNet-DS: +8.3% faster convergence on CIFAR-10 with comparable accuracy, (2) Hybrid approach: Combining LSUV with LION-DG achieves best accuracy (81.92% on CIFAR-10), (3) ResNet-DS: Positive speedup on CIFAR-100 (+11.3%) with side-tap auxiliary design. We identify architecture-specific trade-offs and provide clear guidelines for practitioners. LION-DG is simple, requires zero hyperparameters, and adds no computational overhead.
翻译:权重初始化对神经网络优化至关重要,然而现有方法大多与网络层无关。我们研究了带有辅助分类器的深度监督架构的初始化问题,其中未经训练的辅助头会通过梯度干扰破坏早期训练的稳定性。我们提出了LION-DG,一种层感知初始化方法,该方法将辅助分类器头零初始化,同时对主干网络应用标准的He初始化。我们证明这实现了梯度唤醒:辅助梯度在初始化时精确为零,随后随着权重增长自然引入——提供了一种无需超参数的隐式预热机制。在CIFAR-10和CIFAR-100数据集上使用DenseNet-DS和ResNet-DS架构的实验表明:(1) DenseNet-DS:在CIFAR-10上收敛速度提升8.3%,同时保持相当精度;(2) 混合方法:将LSUV与LION-DG结合获得了最佳精度(CIFAR-10上达81.92%);(3) ResNet-DS:在采用侧接辅助设计的CIFAR-100任务上实现正向加速(+11.3%)。我们识别了架构特定的权衡,并为实践者提供了清晰的指导原则。LION-DG方法简单,无需超参数,且不增加计算开销。