结构可塑性作为主动推理：一种用于稳态控制的仿生架构 (Structural Plasticity as Active Inference: A Biologically-Inspired Architecture for Homeostatic Control)

from arxiv, National Science Foundation (NSF) workshop on Brain-Inspired Dynamics for Engineering Energy-Efficient Circuits and Artificial Intelligence

Traditional neural networks, while powerful, rely on biologically implausible learning mechanisms such as global backpropagation. This paper introduces the Structurally Adaptive Predictive Inference Network (SAPIN), a novel computational model inspired by the principles of active inference and the morphological plasticity observed in biological neural cultures. SAPIN operates on a 2D grid where processing units, or cells, learn by minimizing local prediction errors. The model features two primary, concurrent learning mechanisms: a local, Hebbian-like synaptic plasticity rule based on the temporal difference between a cell's actual activation and its learned expectation, and a structural plasticity mechanism where cells physically migrate across the grid to optimize their information-receptive fields. This dual approach allows the network to learn both how to process information (synaptic weights) and also where to position its computational resources (network topology). We validated the SAPIN model on the classic Cart Pole reinforcement learning benchmark. Our results demonstrate that the architecture can successfully solve the CartPole task, achieving robust performance. The network's intrinsic drive to minimize prediction error and maintain homeostasis was sufficient to discover a stable balancing policy. We also found that while continual learning led to instability, locking the network's parameters after achieving success resulted in a stable policy. When evaluated for 100 episodes post-locking (repeated over 100 successful agents), the locked networks maintained an average 82% success rate.

翻译：传统神经网络虽然功能强大，但依赖于全局反向传播等生物学上不现实的学习机制。本文介绍了结构自适应预测推理网络（SAPIN），这是一种受主动推理原理和生物神经培养物中观察到的形态可塑性启发的新型计算模型。SAPIN在二维网格上运行，其中处理单元（或称细胞）通过最小化局部预测误差进行学习。该模型具有两种主要且并行的学习机制：一种基于细胞实际激活与其学习期望之间时间差的局部类赫布突触可塑性规则，以及一种细胞在网格上物理迁移以优化其信息接收场的结构可塑性机制。这种双重方法使网络能够同时学习如何处理信息（突触权重）以及如何配置其计算资源（网络拓扑）。我们在经典的Cart Pole强化学习基准上验证了SAPIN模型。结果表明，该架构能够成功解决CartPole任务，并获得鲁棒性能。网络最小化预测误差并维持稳态的内在驱动力足以使其发现稳定的平衡策略。我们还发现，虽然持续学习会导致不稳定，但在成功完成任务后锁定网络参数可获得稳定策略。当对锁定后的网络进行100轮评估（在100个成功智能体上重复）时，锁定网络保持了平均82%的成功率。