Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks. However, the massive size of these models poses huge challenges for their deployment in real-world applications. While numerous model compression techniques have been proposed, most of them are not well-suited for achieving extreme model compression when there is a significant gap in model scale. In this paper, we introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT), which effectively transfers the knowledge of LLMs to extremely small-scale models (e.g., 1%). In particular, our approach extracts knowledge from LLMs to construct a knowledge store, from which the small-scale model can retrieve relevant information and leverage it for effective inference. To improve the quality of the model, soft prompt tuning and Proximal Policy Optimization (PPO) reinforcement learning techniques are employed. Extensive experiments are conducted on low-resource tasks from SuperGLUE and GLUE benchmarks. The results demonstrate that the proposed approach significantly enhances the performance of small-scale models by leveraging the knowledge from LLMs.
翻译:大规模预训练语言模型(LLMs)在各类自然语言处理(NLP)任务中展现出了卓越的性能。然而,这些模型的庞大规模为其在实际应用中的部署带来了巨大挑战。尽管已有多种模型压缩技术被提出,但在模型规模差距显著的情况下,大多数技术并不适用于实现极端模型压缩。本文提出了一种名为“基于检索的知识迁移”(RetriKT)的新型压缩范式,该范式能够有效将LLMs的知识迁移至极小规模的模型(例如,1%的规模)。具体而言,我们的方法从LLMs中提取知识以构建知识库,小规模模型可从该库中检索相关信息,并利用其进行有效推理。为提升模型质量,我们采用了软提示调优和近端策略优化(PPO)强化学习技术。我们在SuperGLUE和GLUE基准测试的低资源任务上进行了广泛实验。结果表明,所提方法通过利用LLMs的知识显著提升了小规模模型的性能。