Large deep learning models such as BERT and ResNet achieve state-of-the-art performance but are costly to deploy at the edge due to their size and compute demands. We present RMT-KD, a compression method that leverages Random Matrix Theory (RMT) for knowledge distillation to iteratively reduce network size. Instead of pruning or heuristic rank selection, RMT-KD preserves only informative directions identified via the spectral properties of hidden representations. RMT-based causal reduction is applied layer by layer with self-distillation to maintain stability and accuracy. On GLUE and CIFAR-10, RMT-KD achieves up to 80% parameter reduction with only 2% accuracy loss, delivering 2.8x faster inference and nearly halved power consumption. These results establish RMT-KD as a mathematically grounded approach to network distillation.
翻译:诸如BERT和ResNet等大型深度学习模型虽能实现最先进的性能,但由于其规模和计算需求,在边缘设备上部署成本高昂。本文提出RMT-KD,一种利用随机矩阵理论(RMT)进行知识蒸馏以迭代压缩网络规模的压缩方法。与剪枝或启发式秩选择不同,RMT-KD仅保留通过隐藏表示谱特性识别的信息方向。基于RMT的因果逐层压缩结合自蒸馏技术,以保持模型的稳定性和准确性。在GLUE和CIFAR-10数据集上,RMT-KD实现了高达80%的参数压缩,仅伴随2%的精度损失,推理速度提升2.8倍,功耗降低近半。这些结果表明RMT-KD是一种具有数学理论支撑的网络蒸馏方法。