Recently, multi-layer perceptrons (MLPs) widely used in modern AI applications suffer from limited real-time performance due to intensive memory access overhead. Kolmogorov--Arnold Networks (KANs) have attracted increasing attention as an alternative architecture with similar structures to MLPs but improved parameter efficiency. However, the lack of dedicated hardware support limits the practical performance benefits of KANs. Moreover, since many edge workloads still rely heavily on MLPs, accelerators designed exclusively for KANs become inefficient and impractical. In this work, we present VIKIN, a reconfigurable accelerator that efficiently supports both KAN and MLP inference using unified hardware. VIKIN introduces a pipeline execution mode and two-stage sparsity support for efficient KAN processing, while enabling parallel-mode acceleration to improve MLP throughput under the same sparsity framework. Experiments on real-world datasets demonstrate that replacing MLPs with KANs on VIKIN achieves $1.28\times$ acceleration with $19.58\%$ reduced accuracy loss. For a higher-accuracy KAN model requiring $3.29\times$ more operations, VIKIN incurs only $1.24\times$ latency overhead compared with the baseline KAN model. In addition, VIKIN achieves $1.25\times$ speedup and $4.87\times$ higher energy efficiency than a representative edge GPU when executing KAN workloads.
翻译:近年来,广泛应用于现代人工智能应用的多层感知机(MLP)因密集的内存访问开销而面临实时性能受限的问题。Kolmogorov-Arnold网络(KAN)作为一种替代架构受到越来越多的关注,其结构类似于MLP但具有更高的参数效率。然而,专用硬件支持的缺乏限制了KAN的实际性能优势。此外,由于许多边缘计算任务仍严重依赖MLP,专为KAN设计的加速器在效率和实用性上均显不足。本文提出VIKIN,一种可重构加速器,能够利用统一硬件高效支持KAN与MLP推理。VIKIN为高效处理KAN引入了流水线执行模式和两级稀疏度支持,同时在同一稀疏度框架下启用并行模式加速以提升MLP吞吐量。在真实数据集上的实验表明,在VIKIN上用KAN替代MLP可实现1.28倍的加速,且精度损失降低19.58%。对于计算量增加3.29倍的高精度KAN模型,VIKIN相比基准KAN模型仅产生1.24倍的延迟开销。此外,在执行KAN任务时,VIKIN相比代表性边缘GPU实现了1.25倍的加速和4.87倍的能效提升。