Block-sequential continual learning demands that a single model both protect prior solutions from catastrophic forgetting and efficiently infer at inference time which prior solution matches the current input without task labels. We present Functional Task Networks (FTN), a parameter-isolation method inspired by structural and dynamical motifs found in the mammalian neocortex. Similar to mixture-of-experts, this method uses a high dimensional, self-organizing binary mask over a large population of small but deep networks, inspired by dendritic models of pyramidal neurons. The mask is produced by a three-stage procedure: (1) gradient descent on a continuous mask identifies task-relevant neurons, (2) a smoothing kernel biases the result toward spatial contiguity, (3) and k-winner-take-all binarizes the resulting group at a fixed capacity budget. Like mixture-of-experts, each neuron is an independent deep network, so disjoint masks give exactly disjoint gradient updates, providing structural guarantees against catastrophic forgetting. This three-stage procedure recovers the sub-network of a previously-trained task in a single gradient step, providing unsupervised task segmentation at inference time. We test it on three continual-learning benchmarks: (1) a synthetic multi-task classification/regression generator, (2) MNIST with shuffled class labels (pure concept shift), and (3) Permuted MNIST (domain shift). On all three, FTN with fine grained smoothing (FTN-Slow) results in nearly zero forgetting. FTN with a large kernel and only 2 iterations of smoothing (FTN-Fast) trades off some retention for increased speed. We show that the spatial organization mechanism reduces the effective mask search from the combinatorial top-k subset problem in O(C(H,K)) to the complexity of a near-linear scan in O(H) over compact cortical neighborhoods, which is parallelized by the gradient-based update.
翻译:块序列持续学习要求单一模型既能保护先前解决方案免受灾难性遗忘,又能在推理时无需任务标签即可高效推断当前输入对应的先前解决方案。我们提出功能任务网络(FTN),这是一种受哺乳动物新皮层结构和动力学模式启发的参数隔离方法。类似于专家混合模型,该方法受锥体神经元树突模型启发,在大量小型但深度网络的群体上采用高维自组织二值掩码。该掩码通过三阶段流程生成:(1)对连续掩码进行梯度下降识别任务相关神经元;(2)平滑核使结果偏向空间连续性;(3)k-胜者全取在固定容量预算下将结果组二值化。与专家混合模型类似,每个神经元独立构成深度网络,因此不相交掩码恰好提供不相交梯度更新,从而在结构上保证抗灾难性遗忘。该三阶段流程通过单步梯度即可恢复先前训练任务的子网络,实现推理时的无监督任务分割。我们在三个持续学习基准上进行了测试:(1)合成多任务分类/回归生成器;(2)打乱类别标签的MNIST(纯概念偏移);(3)置换MNIST(领域偏移)。在所有三个基准上,采用细粒度平滑的FTN(FTN-Slow)实现了近乎零遗忘。采用大核且仅2次平滑迭代的FTN(FTN-Fast)则以部分记忆保留换取更高速度。我们证明空间组织机制将有效掩码搜索从组合优化问题O(C(H,K))降阶为紧凑皮层邻域内近线性扫描复杂度O(H),并通过基于梯度的更新实现并行化。