The Forward-Forward (FF) algorithm trains networks layer-by-layer using a local "goodness function," yet sum-of-squares (SoS) has remained the only choice studied. We systematically explore the goodness-function design space and identify a unifying principle: the goodness function must be sensitive to the shape of neural activity, not its total energy. This principle is motivated by the observation that deep network activations follow heavy-tailed distributions and that discriminative information is often concentrated in peak activities. We propose two complementary families: selective functions (top-k, entmax-weighted energy) that measure only peak activity, and shape-sensitive functions (excess kurtosis / "burstiness" and higher-order moments) that reward heavy-tailed distributions via scale-invariant statistics. Combined with separate label-feature forwarding (FFCL), controlled experiments across 13 goodness functions, 5 activations, 6 datasets, and three continuous sweeps, each tracing a characteristic inverted-U, yield 89.0% on Fashion-MNIST and 98.2+-0.1% on MNIST (4x2000), a +32.6pp gain over SoS, with consistent improvements across all benchmarks (+72pp USPS, +52pp SVHN). The scale-invariant nature of burstiness makes it particularly robust to magnitude shifts across layers and datasets.
翻译:前向-前向(FF)算法通过局部“优良函数”逐层训练网络,但平方和函数(SoS)至今仍是唯一被研究的选择。我们系统地探索了优良函数的设计空间,并识别出一个统一原则:优良函数必须对神经活动的形状敏感,而非其总能量。这一原则源于深度网络激活值服从重尾分布、且判别信息常集中于峰值活动的观察。我们提出两个互补函数族:仅测量峰值活动的选择性函数(top-k、entmax加权能量),以及通过尺度不变统计量奖励重尾分布的形状敏感函数(超量峰度/“爆发性”和高阶矩)。结合分离式标签特征前向传播(FFCL),在13种优良函数、5种激活函数、6个数据集及三次连续扫描(各自呈现特征倒U形曲线)上的受控实验,在Fashion-MNIST上达到89.0%,在MNIST(4×2000)上达到98.2±0.1%(较SoS提升32.6个百分点),并在所有基准测试中实现一致改进(USPS +72pp,SVHN +52pp)。爆发性的尺度不变特性使其对跨层及跨数据集的量级变化具有特别鲁棒性。