Recently a million of biological neurons (BNN) has turned out better from modern RL methods in playing pong~\cite{RL}, reminding they are still qualitatively superior e.g. in learning, flexibility and robustness - suggesting to try to improve current artificial e.g. MLP/KAN for better agreement with biological. There is proposed extension of KAN approach to neurons containing model of local joint distribution: $ρ(\mathbf{x})=\sum_{\mathbf{j}\in B} a_\mathbf{j} f_\mathbf{j}(\mathbf{x})$ for $\mathbf{x} \in [0,1]^d$, adding interpretation and information flow control to KAN, and allowing to gradually add missing 3 basic properties of biological: 1) biological axons propagate in both directions~\cite{axon}, while current artificial are focused on unidirectional propagation - joint distribution neurons can repair by substituting some variables, getting conditional values/distributions for the remaining. 2) Animals show risk avoidance~\cite{risk} requiring to process variance, and generally real world rather needs probabilistic models - the proposed can predict and propagate also distributions as vectors of moments: (expected value, variance) or higher. 3) biological neurons require local training, and beside backpropagation, the proposed allows many additional ways, like direct training, through tensor decomposition, or finally local and very promising: information bottleneck. Proposed approach is very general, can be also used as extension of softmax $\textrm{Pr}\propto \exp(-E)$ e.g. in embeddings of transformer, into their probability distributions working on $(a_j)$ few moments: $ρ(x)\approx \sum_j a_j f_j(x)$.
翻译:近期研究表明,百万级生物神经元(BNN)在乒乓球游戏中表现优于现代强化学习方法(RL),这提醒我们生物神经元在学习能力、灵活性和鲁棒性等方面仍具有质的优势——这启示我们尝试改进当前的人工神经元(如MLP/KAN)以更好地契合生物特性。本文提出对KAN方法的扩展,构建包含局部联合分布模型的神经元:对于$\mathbf{x} \in [0,1]^d$,定义$ρ(\mathbf{x})=\sum_{\mathbf{j}\in B} a_\mathbf{j} f_\mathbf{j}(\mathbf{x})$。该扩展为KAN增加了可解释性与信息流控制能力,并能够逐步补全当前人工神经元缺失的三项基本生物特性:1)生物轴突可进行双向传播,而现有人工神经元主要关注单向传播——联合分布神经元可通过变量替换实现双向传播,为剩余变量获取条件值/条件分布。2)动物表现出需要处理方差的风险规避行为,且现实世界通常需要概率模型——所提方法能够以矩向量(期望值、方差或更高阶矩)的形式预测和传播分布。3)生物神经元需要局部训练,除反向传播外,所提方法支持多种附加训练方式,如直接训练、通过张量分解训练,以及极具潜力的局部训练方法:信息瓶颈。该提议方法具有高度通用性,亦可作为softmax函数$\textrm{Pr}\propto \exp(-E)$的扩展,例如将Transformer中的嵌入表示转化为基于$(a_j)$若干矩操作的概率分布:$ρ(x)\approx \sum_j a_j f_j(x)$。