Short text classification is a crucial and challenging aspect of Natural Language Processing. For this reason, there are numerous highly specialized short text classifiers. However, in recent short text research, State of the Art (SOTA) methods for traditional text classification, particularly the pure use of Transformers, have been unexploited. In this work, we examine the performance of a variety of short text classifiers as well as the top performing traditional text classifier. We further investigate the effects on two new real-world short text datasets in an effort to address the issue of becoming overly dependent on benchmark datasets with a limited number of characteristics. Our experiments unambiguously demonstrate that Transformers achieve SOTA accuracy on short text classification tasks, raising the question of whether specialized short text techniques are necessary.
翻译:短文本分类是自然语言处理中一项关键且具挑战性的任务。为此,学界开发了大量高度专业化的短文本分类器。然而,在近期短文本研究中,传统文本分类的最先进方法——特别是纯Transformer的应用——尚未得到充分探索。在本工作中,我们考察了多种短文本分类器以及性能最优的传统文本分类器的表现。我们进一步研究了两个新型真实世界短文本数据集的影响,以解决过度依赖特征数量有限的基准数据集的问题。实验结果明确表明,Transformer在短文本分类任务上达到了最先进的准确率,这引发了关于专业化短文本技术是否必要的问题。