Pre training of language models on large text corpora is common practice in Natural Language Processing. Following, fine tuning of these models is performed to achieve the best results on a variety of tasks. In this paper we question the common practice of only adding a single output layer as a classification head on top of the network. We perform an AutoML search to find architectures that outperform the current single layer at only a small compute cost. We validate our classification architecture on a variety of NLP benchmarks from the GLUE dataset.
翻译:在自然语言处理中,基于大规模文本语料库的语言模型预训练已成为常见做法。随后,对这些模型进行微调以在各类任务上取得最佳效果。本文质疑了仅在网络顶层添加单层输出层作为分类头的常见做法。我们通过自动化机器学习搜索,以极小的计算成本找到了优于当前单层架构的分类架构。我们在GLUE数据集的多项自然语言处理基准任务上验证了所提分类架构的有效性。