On the Stability of Growth in Structural Plasticity

Standard deep-learning pipelines usually choose the network architecture before training and keep it fixed throughout optimization. In contrast, a model can also be adapted by editing its structure during training, for example by pruning existing hidden-neuron units or growing new ones. Although growth is appealing for adaptive and continual systems, we show that it is not simply the inverse of pruning. Pruning selects among units that have participated in training from the start, whereas growth inserts new units into an already specialized optimization trajectory. We isolate this insertion problem and show that newborn units are often forward-active but backward-starved: they participate in the forward computation, yet receive much weaker gradient signal than incumbent units. This disadvantage is minor in small MLP benchmarks, but becomes clear in harder image-classification settings with a convolutional trunk. In these settings, \textsc{Grow} can achieve high final accuracy during the structural-editing procedure, while \textsc{Prune} is stronger when performance is averaged over the training trajectory or when the final sparse network is retrained from scratch. Interventions targeting optimizer state, insertion, selection, and trainability show that improving the integration of newborn units can improve adaptive performance, but does not automatically produce better final subnetworks. In continual-learning benchmarks stressing plasticity loss, \textsc{Grow} becomes competitive mainly when new units have enough time to integrate. Together, these results suggest that \textsc{Grow} should be evaluated not only as an architecture-search operator, but as a time-sensitive optimization process whose success depends on insertion stability.

翻译：标准深度学习流程通常在训练前选择网络架构，并在优化过程中保持不变。相比之下，模型也可以通过训练期间编辑其结构进行适配，例如剪枝现有隐藏神经元单元或增长新单元。尽管增长对于自适应和持续学习系统具有吸引力，但我们表明它并非简单的剪枝逆操作。剪枝是从一开始就参与训练的单元中进行选择，而增长则是将新单元插入已经专业化的优化轨迹中。我们分离出这种插入问题，并表明新生单元通常前向激活但反向饥饿：它们参与前向计算，但接收到的梯度信号比现有单元弱得多。这种劣势在小型多层感知器基准测试中影响较小，但在具有卷积主干结构的较难图像分类任务中变得明显。在这些任务中，\textsc{Grow} 在结构编辑过程中能实现高最终准确率，而当平均训练轨迹上的性能或最终稀疏网络从头重新训练时，\textsc{Prune} 则表现更强。针对优化器状态、插入、选择和可训练性的干预表明，改善新生单元的整合可以提升自适应性能，但并不能自动产生更好的最终子网络。在强调可塑性损失持续学习基准测试中，\textsc{Grow} 主要在新单元有足够时间整合时具有竞争力。综合这些结果表明，\textsc{Grow} 不仅应作为架构搜索算子进行评估，还应视作一个依赖于插入稳定性的时间敏感优化过程。