Sparse neural systems are gaining traction for efficient continual learning due to their modularity and low interference. Architectures such as Sparse Distributed Memory Multi-Layer Perceptrons (SDMLP) construct task-specific subnetworks via Top-K activation and have shown resilience against catastrophic forgetting. However, their rigid modularity limits cross-task knowledge reuse and leads to performance degradation under high sparsity. We propose Selective Subnetwork Distillation (SSD), a structurally guided continual learning framework that treats distillation not as a regularizer but as a topology-aligned information conduit. SSD identifies neurons with high activation frequency and selectively distills knowledge within previous Top-K subnetworks and output logits, without requiring replay or task labels. This enables structural realignment while preserving sparse modularity. Experiments on Split CIFAR-10, CIFAR-100, and MNIST demonstrate that SSD improves accuracy, retention, and representation coverage, offering a structurally grounded solution for sparse continual learning.
翻译:稀疏神经系统因其模块化和低干扰性,在高效持续学习中日益受到关注。诸如稀疏分布式记忆多层感知机(SDMLP)等架构通过Top-K激活构建任务特定子网络,并展现出对灾难性遗忘的鲁棒性。然而,其僵化的模块化限制了跨任务知识重用,并在高稀疏度下导致性能下降。我们提出选择性子网络蒸馏(SSD),一种结构引导的持续学习框架,将蒸馏视为拓扑对齐的信息通道而非正则化器。SSD识别高激活频率的神经元,并在无需回放或任务标签的情况下,选择性地在先前Top-K子网络和输出逻辑中进行知识蒸馏。这实现了结构重对齐,同时保持了稀疏模块化特性。在Split CIFAR-10、CIFAR-100和MNIST数据集上的实验表明,SSD显著提升了准确率、记忆保持率和表征覆盖度,为稀疏持续学习提供了结构基础的理论解决方案。