Pre-trained Vision Kolmogorov-Arnold Networks (KANs) store a dense B-spline grid on every edge, inflating prediction-head parameter counts by more than 140X relative to a comparable MLP and pushing inference into a memory-bound regime on edge accelerators. Standard magnitude pruning fails on these pre-trained models: zero-shot sparsity collapses accuracy, and restoring it requires an iterative fine-tuning loop that is impractical in deployment settings. We present SHARe-KAN, a post-training compiler that compresses spline coefficients via a Gain-Shape-Bias decomposition with a layer-shared codebook, paired with LUTHAM, an ExecuTorch runtime that maps the codebook into on-chip L2. On PASCAL VOC detection with a ResNet-50 backbone, SHARe-KAN Int8 reaches 9.3X storage compression over the Dense KAN baseline (6.32 MB vs. 58.67 MB prediction head) at a 2.0 point in-domain accuracy cost (80.22% vs. 82.22% mAP), with no retraining. Zero-shot transfer to COCO retains 88.9% of the Dense KAN mAP; most of this gap comes from the VQ clustering step itself, and further quantization from FP32 to Int8 costs only 1.3 retention points. The value of the approach compounds at scale: at 50 task heads, Dense KAN prediction-head storage reaches 2.9 GB while SHARe-KAN Int8 requires 211 MB, a 13.9X reduction that brings multi-expert KAN deployment within the memory budgets of contemporary edge silicon.
翻译:预训练的视觉Kolmogorov-Arnold网络(KAN)在每条边上存储稠密的B样条网格,导致预测头的参数量相较于同类MLP膨胀超过140倍,并将边缘加速器上的推理推向内存受限状态。标准幅值剪枝在这些预训练模型上失效:零样本稀疏性会导致精度骤降,而恢复精度需要迭代微调流程,这在部署场景中不切实际。我们提出SHARe-KAN——一种后训练编译器,通过增益-形状-偏差分解结合层共享码本来压缩样条系数,并搭配LUTHAM——一种将码本映射到片上L2缓存的ExecuTorch运行时。基于ResNet-50骨干网络的PASCAL VOC检测任务中,SHARe-KAN Int8相比稠密KAN基线实现了9.3倍存储压缩(预测头从58.67 MB降至6.32 MB),域内精度代价为2.0个百分点(mAP从82.22%降至80.22%),且无需重训练。零样本迁移至COCO时保留稠密KAN mAP的88.9%;此差距主要源于VQ聚类步骤本身,而FP32到Int8的进一步量化仅造成1.3个保留点损失。该方法的价值随规模放大而凸显:在50个任务头场景下,稠密KAN预测头存储达2.9 GB,而SHARe-KAN Int8仅需211 MB(13.9倍缩减),使多专家KAN部署可匹配当代边缘芯片的内存预算。