Deep learning has witnessed significant advancements in recent years at the cost of increasing training, inference, and model storage overhead. While existing model compression methods strive to reduce the number of model parameters while maintaining high accuracy, they inevitably necessitate the re-training of the compressed model or impose architectural constraints. To overcome these limitations, this paper presents a novel framework, termed \textbf{K}nowledge \textbf{T}ranslation (KT), wherein a ``translation'' model is trained to receive the parameters of a larger model and generate compressed parameters. The concept of KT draws inspiration from language translation, which effectively employs neural networks to convert different languages, maintaining identical meaning. Accordingly, we explore the potential of neural networks to convert models of disparate sizes, while preserving their functionality. We propose a comprehensive framework for KT, introduce data augmentation strategies to enhance model performance despite restricted training data, and successfully demonstrate the feasibility of KT on the MNIST dataset. Code is available at \url{https://github.com/zju-SWJ/KT}.
翻译:深度学习近年来取得了显著进展,但代价是训练、推理和模型存储开销不断增加。现有模型压缩方法虽致力于在保持高精度的同时减少模型参数量,却不可避免地需要重新训练压缩后的模型或施加架构限制。为克服这些局限,本文提出一种新型框架——**知识翻译**(KT),其中训练一个“翻译”模型来接收较大模型的参数并生成压缩后的参数。KT的概念借鉴了语言翻译,后者通过神经网络有效转换不同语言而保持语义一致。据此,我们探索了神经网络将不同规模模型进行转换并保持其功能的潜力。我们提出了KT的完整框架,引入了数据增强策略以在训练数据受限的情况下提升模型性能,并在MNIST数据集上成功验证了KT的可行性。代码见:\url{https://github.com/zju-SWJ/KT}。