Kolmogorov--Arnold Networks (KANs) replace linear weights with spline-based functions, offering strong expressivity but posing challenges for low-precision deployment due to heterogeneous parameter distributions. We introduce QuantKAN, the first unified framework for quantization-aware training (QAT) and post-training quantization (PTQ) of KANs. The framework employs branch-aware quantizers for base and spline parameters and extends modern QAT and PTQ methods to spline-based layers across EfficientKAN, FastKAN, PyKAN, and KAGN. Experiments on MNIST, CIFAR-10/100, TinyImageNet, and ImageNet provide the first unified QAT/PTQ KAN benchmarks and show that DSQ is the most robust QAT method at aggressive low-bit settings, while GPTQ is the strongest PTQ method at moderate precision. Sensitivity analyses reveal architecture-specific failure modes: spline/basis parameters dominate in FastKAN, while base or scaling parameters dominate in EfficientKAN, GRAM, and PyKAN. Vivado HLS estimates on a Xilinx UltraScale+ device further suggest up to 3.32$\times$ throughput and 7.7$\times$ lower estimated dynamic energy per inference under W4A4, exposing a residual \emph{basis-evaluation tax} that motivates basis-aware microarchitecture. QuantKAN is available at https://github.com/OSU-STARLAB/QuantKAN/.
翻译:科尔莫戈罗夫-阿诺德网络(KANs)用基于样条的函数替代线性权重,虽然提供了强大的表达能力,但由于参数分布异构性,给低精度部署带来了挑战。我们提出QuantKAN,这是首个针对KANs的量化感知训练(QAT)和后训练量化(PTQ)统一框架。该框架对基参数和样条参数采用分支感知量化器,并将现代QAT和PTQ方法扩展至EfficientKAN、FastKAN、PyKAN和KAGN中的样条层。在MNIST、CIFAR-10/100、TinyImageNet和ImageNet上的实验不仅提供了首个统一的QAT/PTQ KAN基准测试,还表明DSQ在激进低比特设置下是最稳健的QAT方法,而GPTQ在中等精度下是最强的PTQ方法。敏感性分析揭示了架构特定的失效模式:在FastKAN中,样条/基参数占主导地位;而在EfficientKAN、GRAM和PyKAN中,基或缩放参数占主导地位。基于Xilinx UltraScale+器件的Vivado HLS估算进一步表明,在W4A4配置下吞吐量提升高达3.32倍,每次推理的估算动态能耗降低7.7倍,从而揭示了残余的“基评估代价”,这激发了面向基的微架构设计。QuantKAN代码开源地址:https://github.com/OSU-STARLAB/QuantKAN/。