Recently, the development of pre-trained language models has brought natural language processing (NLP) tasks to the new state-of-the-art. In this paper we explore the efficiency of various pre-trained language models. We pre-train a list of transformer-based models with the same amount of text and the same training steps. The experimental results shows that the most improvement upon the origin BERT is adding the RNN-layer to capture more contextual information for short text understanding. But the conclusion is: There are no remarkable improvement for short text understanding for similar BERT structures. Data-centric method[12] can achieve better performance.
翻译:近年来,预训练语言模型的发展将自然语言处理(NLP)任务提升至新的最先进水平。本文探究了多种预训练语言模型的效率。我们使用相同数量的文本和相同的训练步数,预训练了一系列基于Transformer的模型。实验结果表明,对原始BERT模型的最大改进在于添加RNN层以捕捉更多短文本理解的上下文信息。但结论是:对于类似的BERT结构,在短文本理解方面并无显著提升。以数据为中心的方法[12]能够实现更优的性能。