Physical Analogue Kolmogorov-Arnold Networks based on Reconfigurable Nonlinear-Processing Units

Kolmogorov-Arnold Networks (KANs) shift neural computation from linear layers to learnable nonlinear edge functions, but implementing these nonlinearities efficiently in hardware remains an open challenge. Here we introduce a physical analogue KAN architecture in which edge functions are realized in materia using reconfigurable nonlinear-processing units (RNPUs): multi-terminal nanoscale silicon devices whose input-output characteristics are tuned via control voltages. By combining multiple RNPUs into an edge processor and assembling these blocks into a reconfigurable analogue KAN (aKAN) architecture with integrated mixed-signal interfacing, we establish a realistic system-level hardware implementation that enables compact KAN-style regression and classification with programmable nonlinear transformations. Using experimentally calibrated RNPU models and hardware measurements, we demonstrate accurate function approximation across increasing task complexity while requiring fewer or comparable trainable parameters than multilayer perceptrons (MLPs). System-level estimates indicate an energy per inference of $\sim$250 pJ and an end-to-end inference latency of $\sim$600 ns for a representative workload, corresponding to a $\sim$10$^{2}$-10$^{3}\times$ reduction in energy accompanied by a $\sim$10$\times$ reduction in area compared to a digital fixed-point MLP at similar approximation error. These results establish RNPUs as scalable, hardware-native nonlinear computing primitives and identify analogue KAN architectures as a realistic silicon-based pathway toward energy-, latency-, and footprint-efficient analogue neural-network hardware, particularly for edge inference.

翻译：柯尔莫哥洛夫-阿诺德网络（KANs）将神经计算从线性层转移到可学习的非线性边函数上，但在硬件中高效实现这些非线性函数仍是一个开放性挑战。本文提出了一种物理类KAN架构，其中边函数通过可重构非线性处理单元（RNPUs）在材料中实现：这些多端纳米硅器件的输入-输出特性可通过控制电压进行调谐。通过将多个RNPU组合成边缘处理器，并将这些模块与集成混合信号接口组装成可重构类KAN（aKAN）架构，我们建立了一个实际的系统级硬件实现方案，能够通过可编程非线性变换实现紧凑的KAN式回归与分类。利用实验校准的RNPU模型和硬件测量，我们展示了在任务复杂度递增时精确的函数逼近能力，同时其可训练参数数量少于或相当于多层感知器（MLPs）。系统级估算表明，对于代表性工作负载，单次推理的能耗约为250 pJ，端到端推理延迟约为600 ns，与相似逼近误差下的数字定点MLP相比，能耗降低约10^2-10^3倍，面积减少约10倍。这些结果确立了RNPU作为可扩展的硬件原生非线性计算基元，并将类KAN架构识别为一条基于硅基、实现高效能耗、延迟和面积的类神经网硬件现实路径，尤其适用于边缘推理场景。