Pruning as Evolution: Emergent Sparsity Through Selection Dynamics in Neural Networks

Neural networks are commonly trained in highly overparameterized regimes, yet empirical evidence consistently shows that many parameters become redundant during learning. Most existing pruning approaches impose sparsity through explicit intervention, such as importance-based thresholding or regularization penalties, implicitly treating pruning as a centralized decision applied to a trained model. This assumption is misaligned with the decentralized, stochastic, and path-dependent character of gradient-based training. We propose an evolutionary perspective on pruning: parameter groups (neurons, filters, heads) are modeled as populations whose influence evolves continuously under selection pressure. Under this view, pruning corresponds to population extinction: components with persistently low fitness gradually lose influence and can be removed without discrete pruning schedules and without requiring equilibrium computation. We formalize neural pruning as an evolutionary process over population masses, derive selection dynamics governing mass evolution, and connect fitness to local learning signals. We validate the framework on MNIST using a population-scaled MLP (784--512--256--10) with 768 prunable neuron populations. All dynamics reach dense baselines near 98\% test accuracy. We benchmark post-training hard pruning at target sparsity levels (35--50\%): pruning 35\% yields $\approx$95.5\% test accuracy, while pruning 50\% yields $\approx$88.3--88.6\%, depending on the dynamic. These results demonstrate that evolutionary selection produces a measurable accuracy--sparsity tradeoff without explicit pruning schedules during training.

翻译：神经网络通常在高度过参数化的状态下进行训练，然而经验证据一致表明许多参数在学习过程中变得冗余。大多数现有的剪枝方法通过显式干预（例如基于重要性的阈值化或正则化惩罚）来施加稀疏性，这隐含地将剪枝视为应用于已训练模型的集中式决策。这种假设与基于梯度的训练所具有的去中心化、随机性和路径依赖性特征并不一致。我们提出一种关于剪枝的进化视角：将参数组（神经元、滤波器、注意力头）建模为种群，其影响力在选择压力下持续演化。在此视角下，剪枝对应于种群灭绝：适应度持续较低的组件会逐渐丧失影响力，并且可以在无需离散剪枝调度、也无需进行平衡计算的情况下被移除。我们将神经剪枝形式化为种群质量上的进化过程，推导出控制质量演化的选择动力学，并将适应度与局部学习信号相关联。我们在MNIST数据集上使用具有768个可剪枝神经元种群的按种群缩放的多层感知机（784--512--256--10）验证了该框架。所有动力学过程均达到了接近98\%测试准确率的密集基线。我们在目标稀疏度水平（35--50\%）下对训练后硬剪枝进行了基准测试：剪枝35\%时测试准确率约为95.5\%，而剪枝50\%时，根据具体动力学过程的不同，测试准确率约为88.3\%至88.6\%。这些结果表明，进化选择能在训练过程中无需显式剪枝调度的情况下，产生可测量的准确率-稀疏度权衡。