Variational Distributional Neuron

We propose a proof of concept for a variational distributional neuron: a compute unit formulated as a VAE brick, explicitly carrying a prior, an amortized posterior and a local ELBO. The unit is no longer a deterministic scalar but a distribution: computing is no longer about propagating values, but about contracting a continuous space of possibilities under constraints. Each neuron parameterizes a posterior, propagates a reparameterized sample and is regularized by the KL term of a local ELBO - hence, the activation is distributional. This "contraction" becomes testable through local constraints and can be monitored via internal measures. The amount of contextual information carried by the unit, as well as the temporal persistence of this information, are locally tuned by distinct constraints. This proposal addresses a structural tension: in sequential generation, causality is predominantly organized in the symbolic space and, even when latents exist, they often remain auxiliary, while the effective dynamics are carried by a largely deterministic decoder. In parallel, probabilistic latent models capture factors of variation and uncertainty, but that uncertainty typically remains borne by global or parametric mechanisms, while units continue to propagate scalars - hence the pivot question: if uncertainty is intrinsic to computation, why does the compute unit not carry it explicitly? We therefore draw two axes: (i) the composition of probabilistic constraints, which must be made stable, interpretable and controllable; and (ii) granularity: if inference is a negotiation of distributions under constraints, should the primitive unit remain deterministic or become distributional? We analyze "collapse" modes and the conditions for a "living neuron", then extend the contribution over time via autoregressive priors over the latent, per unit.

翻译：我们提出了一种变分分布神经元的原理验证：该计算单元被表述为一个VAE模块，显式地携带先验、摊销后验和局部ELBO。该单元不再是确定性标量，而是一个分布：计算不再关乎数值传播，而是关于在约束条件下收缩连续的可能性空间。每个神经元参数化一个后验分布，传播重参数化样本，并通过局部ELBO的KL项进行正则化——因此，激活是分布式的。这种"收缩"可通过局部约束进行检验，并可通过内部度量进行监测。单元携带的上下文信息量以及该信息的时间持续性，均由不同的局部约束进行调节。本提案针对一个结构性矛盾：在序列生成中，因果性主要在符号空间组织，即使存在隐变量，它们通常仍保持辅助性，而有效动力学主要由高度确定性的解码器承载。与此同时，概率隐变量模型虽能捕捉变化因素和不确定性，但该不确定性通常仍由全局或参数化机制承担，而计算单元继续传播标量——由此引出核心问题：若不确定性是计算的内在属性，为何计算单元不显式承载它？因此我们提出两个轴线：(i) 概率约束的组合必须保持稳定、可解释且可控；(ii) 粒度问题：若推断是在约束条件下对分布的协商，原始单元应保持确定性还是转变为分布性？我们分析了"坍缩"模式及"活性神经元"的存在条件，进而通过单元级隐变量的自回归先验将贡献扩展到时域维度。