Heterogeneous Connectivity in Sparse Networks: Fan-in Profiles, Gradient Hierarchy, and Topological Equilibria

Profiled Sparse Networks (PSN) replace uniform connectivity with deterministic, heterogeneous fan-in profiles defined by continuous, nonlinear functions, creating neurons with both dense and sparse receptive fields. We benchmark PSN across four classification datasets spanning vision and tabular domains, input dimensions from 54 to 784, and network depths of 2--3 hidden layers. At 90% sparsity, all static profiles, including the uniform random baseline, achieve accuracy within 0.2-0.6% of dense baselines on every dataset, demonstrating that heterogeneous connectivity provides no accuracy advantage when hub placement is arbitrary rather than task-aligned. This result holds across sparsity levels (80-99.9%), profile shapes (eight parametric families, lognormal, and power-law), and fan-in coefficients of variation from 0 to 2.5. Internal gradient analysis reveals that structured profiles create a 2-5x gradient concentration at hub neurons compared to the ~1x uniform distribution in random baselines, with the hierarchy strength predicted by fan-in coefficient of variation ($r = 0.93$). When PSN fan-in distributions are used to initialise RigL dynamic sparse training, lognormal profiles matched to the equilibrium fan-in distribution consistently outperform standard ERK initialisation, with advantages growing on harder tasks, achieving +0.16% on Fashion-MNIST ($p = 0.036$, $d = 1.07$), +0.43% on EMNIST, and +0.49% on Forest Cover. RigL converges to a characteristic fan-in distribution regardless of initialisation. Starting at this equilibrium allows the optimiser to refine weights rather than rearrange topology. Which neurons become hubs matters more than the degree of connectivity variance, i.e., random hub placement provides no advantage, while optimisation-driven placement does.

翻译：特征化稀疏网络（PSN）用由连续非线性函数定义的确定性异构扇入分布替代均匀连接，从而生成兼具密集与稀疏感受野的神经元。我们在涵盖视觉与表格域、输入维度从54到784、网络深度为2-3个隐藏层的四个分类数据集上对PSN进行基准测试。在90%稀疏度下，所有静态分布（包括均匀随机基线）在每个数据集上的准确率均与密集基线相差0.2-0.6%，这表明当中枢节点位置任意指定而非与任务对齐时，异构连接性不提供准确率优势。该结论在80-99.9%的稀疏度、八种参数族分布（对数正态与幂律分布）以及扇入变异系数从0到2.5的范围内均成立。内部梯度分析表明，与随机基线中约1倍的均匀分布相比，结构化分布在中枢神经元处产生2-5倍的梯度集中度，层次强度可由扇入变异系数预测（r = 0.93）。当使用PSN扇入分布初始化RigL动态稀疏训练时，与平衡扇入分布匹配的对数正态分布始终优于标准ERK初始化，且在更困难任务上优势更显著：在Fashion-MNIST上提升+0.16%（p = 0.036, d = 1.07），在EMNIST上提升+0.43%，在Forest Cover上提升+0.49%。无论初始化方式如何，RigL均收敛至特征性扇入分布。从该均衡态出发，优化器可专注于权重精炼而非拓扑重组。神经元是否成为中枢比连接方差的程度更重要——即随机放置中枢无优势，而优化驱动的放置则能带来优势。