Large-scale pre-trained language models such as BERT are popular solutions for text classification. Due to the superior performance of these advanced methods, nowadays, people often directly train them for a few epochs and deploy the obtained model. In this opinion paper, we point out that this way may only sometimes get satisfactory results. We argue the importance of running a simple baseline like linear classifiers on bag-of-words features along with advanced methods. First, for many text data, linear methods show competitive performance, high efficiency, and robustness. Second, advanced models such as BERT may only achieve the best results if properly applied. Simple baselines help to confirm whether the results of advanced models are acceptable. Our experimental results fully support these points.
翻译:大规模预训练语言模型(如BERT)是文本分类的主流解决方案。由于这些先进方法的优越性能,如今人们常常直接对其进行少量epoch的训练后即部署模型。本文作为观点论文指出,这种做法未必总能获得令人满意的结果。我们强调在应用先进方法的同时,采用词袋特征上的线性分类器等简单基线方法的重要性。首先,对许多文本数据而言,线性方法展现出具有竞争力的性能、高效率和鲁棒性。其次,诸如BERT等先进模型只有在正确应用时才能达到最优效果。简单基线方法有助于验证先进模型的结果是否可接受。我们的实验结果充分支持了上述观点。