Physical Analog Kolmogorov-Arnold Networks based on Reconfigurable Nonlinear-Processing Units

Kolmogorov-Arnold Networks (KANs) shift neural computation from linear layers to learnable nonlinear edge functions, but implementing these nonlinearities efficiently in hardware remains an open challenge. Here we introduce a physical analog KAN architecture in which edge functions are realized in materia using reconfigurable nonlinear-processing units (RNPUs): multi-terminal nanoscale silicon devices whose input-output characteristics are tuned via control voltages. By combining multiple RNPUs into an edge processor and assembling these blocks into a reconfigurable analog KAN (aKAN) architecture with integrated mixed-signal interfacing, we establish a realistic system-level hardware implementation that enables compact KAN-style regression and classification with programmable nonlinear transformations. Using experimentally calibrated RNPU models and hardware measurements, we demonstrate accurate function approximation across increasing task complexity while requiring fewer or comparable trainable parameters than multilayer perceptrons (MLPs). System-level estimates indicate an energy per inference of $\sim$250 pJ and an end-to-end inference latency of $\sim$600 ns for a representative workload, corresponding to a $\sim$10$^{2}$-10$^{3}\times$ reduction in energy accompanied by a $\sim$10$\times$ reduction in area compared to a digital fixed-point MLP at similar approximation error. These results establish RNPUs as scalable, hardware-native nonlinear computing primitives and identify analog KAN architectures as a realistic silicon-based pathway toward energy-, latency-, and footprint-efficient analog neural-network hardware, particularly for edge inference.

翻译：Kolmogorov-Arnold网络（KANs）将神经计算从线性层转向可学习的非线性边缘函数，然而在硬件中高效实现这些非线性特性仍是一个开放挑战。本文提出一种物理模拟KAN架构，其中边缘函数通过可重构非线性处理单元（RNPUs）在物质层面实现：这类多端纳米级硅器件的输入输出特性可通过控制电压进行调节。通过将多个RNPU集成至边缘处理器，并将这些模块与混合信号接口结合组装成可重构模拟KAN（aKAN）架构，我们建立了一种现实的系统级硬件实施方案，能够以可编程非线性变换实现紧凑的KAN式回归与分类。利用实验校准的RNPU模型和硬件测量数据，我们证明了在任务复杂度递增的情况下仍能实现精确的函数逼近，且所需可训练参数数量少于或多层感知机（MLPs）相当。系统级评估显示，在典型工作负载下每次推理能耗约为$\sim$250 pJ，端到端推理延迟约为$\sim$600 ns，相较于在相近逼近误差下的数字定点MLP，实现了$\sim$10$^{2}$-10$^{3}$倍的能耗降低与$\sim$10倍的面积缩减。这些结果表明RNPU可作为可扩展的硬件原生非线性计算基元，并确立了模拟KAN架构作为实现高能效、低延迟、小尺寸模拟神经网络硬件的现实硅基技术路径，尤其适用于边缘推理场景。