In natural language processing, pre-trained language models have become essential infrastructures. However, these models often suffer from issues such as large size, long inference time, and challenging deployment. Moreover, most mainstream pre-trained models focus on English, and there are insufficient studies on small Chinese pre-trained models. In this paper, we introduce MiniRBT, a small Chinese pre-trained model that aims to advance research in Chinese natural language processing. MiniRBT employs a narrow and deep student model and incorporates whole word masking and two-stage distillation during pre-training to make it well-suited for most downstream tasks. Our experiments on machine reading comprehension and text classification tasks reveal that MiniRBT achieves 94% performance relative to RoBERTa, while providing a 6.8x speedup, demonstrating its effectiveness and efficiency.
翻译:在自然语言处理中,预训练语言模型已成为关键基础设施。然而,这些模型往往存在体积庞大、推理时间长、部署困难等问题。此外,主流预训练模型主要面向英文,针对小型中文预训练模型的研究尚不充分。本文提出了一种小型中文预训练模型MiniRBT,旨在推动中文自然语言处理的研究进展。MiniRBT采用窄而深的学生模型架构,并在预训练阶段引入全词遮蔽与两阶段蒸馏策略,使其能够适配大多数下游任务。在机器阅读理解与文本分类任务上的实验表明,MiniRBT在达到RoBERTa 94%性能的同时,实现了6.8倍的推理加速,验证了其有效性与高效性。