We introduce an unsupervised visual representation learning system based entirely on local plasticity rules, without labels, backpropagation, or global error signals. The model is a VisNet-inspired hierarchical architecture combining opponent color inputs, multi-frequency Gabor and wavelet feature streams, competitive normalization with lateral inhibition, saliency modulation, associative memory, and a feedback loop. All representation learning occurs through continuous local plasticity applied to unlabeled image streams over 300 epochs. Performance is evaluated using a fixed linear probe trained only at readout time. The system achieves 80.1 percent accuracy on CIFAR-10 and 47.6 percent on CIFAR-100, improving over a Hebbian-only baseline. Ablation studies show that anti-Hebbian decorrelation, free-energy inspired plasticity, and associative memory are the main contributors, with strong synergistic effects. Even without learning, the fixed architecture alone reaches 61.4 percent on CIFAR-10, indicating that plasticity, not only inductive bias, drives most of the performance. Control analyses show that independently trained probes match co-trained ones within 0.3 percentage points, and a nearest-class-mean classifier achieves 78.3 percent without gradient-based training, confirming the intrinsic structure of the learned features. Overall, the system narrows but does not eliminate the performance gap to backpropagation-trained CNNs (5.7 percentage points on CIFAR-10, 7.5 percentage points on CIFAR-100), demonstrating that structured local plasticity alone can learn strong visual representations from raw unlabeled data.
翻译:我们提出了一种完全基于局部可塑性规则的无监督视觉表征学习系统,该系统无需标签、反向传播或全局误差信号。模型采用受VisNet启发的层级架构,融合了对手色输入、多频率Gabor与小波特征流、侧抑制竞争归一化、显著性调制、联想记忆及反馈回路。所有表征学习均通过连续局部可塑性机制在300个训练周期内对未标注图像流进行。性能评估采用仅在读出阶段训练的固定线性探针。该系统在CIFAR-10上达到80.1%的准确率,在CIFAR-100上达到47.6%,优于纯Hebbian基线。消融研究表明反Hebbian去相关、自由能启发可塑性与联想记忆是主要贡献因素,且具有强协同效应。即使不进行学习,固定架构本身在CIFAR-10上即可达到61.4%的准确率,表明可塑性(而非仅归纳偏置)驱动了大部分性能提升。对照分析显示独立训练的探针与协同训练的探针结果相差在0.3个百分点以内,而无需梯度训练的最近类均值分类器即达78.3%的准确率,证实了学习特征的内在结构性。总体而言,该系统缩小了但与反向传播训练的CNN仍存在性能差距(CIFAR-10差距5.7个百分点,CIFAR-100差距7.5个百分点),证明仅凭结构化局部可塑性即可从未标注原始数据中学习到强视觉表征。