Pre-trained BERT models have achieved impressive accuracy on natural language processing (NLP) tasks. However, their excessive amount of parameters hinders them from efficient deployment on edge devices. Binarization of the BERT models can significantly alleviate this issue but comes with a severe accuracy drop compared with their full-precision counterparts. In this paper, we propose an efficient and robust binary ensemble BERT (BEBERT) to bridge the accuracy gap. To the best of our knowledge, this is the first work employing ensemble techniques on binary BERTs, yielding BEBERT, which achieves superior accuracy while retaining computational efficiency. Furthermore, we remove the knowledge distillation procedures during ensemble to speed up the training process without compromising accuracy. Experimental results on the GLUE benchmark show that the proposed BEBERT significantly outperforms the existing binary BERT models in accuracy and robustness with a 2x speedup on training time. Moreover, our BEBERT has only a negligible accuracy loss of 0.3% compared to the full-precision baseline while saving 15x and 13x in FLOPs and model size, respectively. In addition, BEBERT also outperforms other compressed BERTs in accuracy by up to 6.7%.
翻译:预训练的BERT模型在自然语言处理任务中取得了令人瞩目的准确率。然而,其庞大的参数量阻碍了其在边缘设备上的高效部署。BERT模型的二值化能显著缓解这一问题,但与全精度模型相比,会带来严重的准确率下降。本文提出了一种高效鲁棒的二元集成BERT(BEBERT),以弥补这一准确率差距。据我们所知,这是首个将集成技术应用于二元BERT的研究,由此产生的BEBERT在保持计算效率的同时实现了卓越的准确率。此外,我们在集成过程中移除了知识蒸馏流程,以在不影响准确率的前提下加速训练。在GLUE基准上的实验结果表明,所提出的BEBERT在准确率和鲁棒性上均显著优于现有二元BERT模型,且训练速度提升2倍。同时,与全精度基线相比,BEBERT的准确率损失仅为0.3%,而在FLOPs和模型大小上分别节省了15倍和13倍。此外,BEBERT在准确率上还比其他压缩BERT模型高出6.7%。