The popularity of graph neural networks has triggered a resurgence of graph-based methods for single-label and multi-label text classification. However, it is unclear whether these graph-based methods are beneficial compared to standard machine learning methods and modern pretrained language models. We compare a rich selection of bag-of-words, sequence-based, graph-based, and hierarchical methods for text classification. We aggregate results from the literature over 5 single-label and 7 multi-label datasets and run our own experiments. Our findings unambiguously demonstrate that for single-label and multi-label classification tasks, the graph-based methods fail to outperform fine-tuned language models and sometimes even perform worse than standard machine learning methods like multilayer perceptron (MLP) on a bag-of-words. This questions the enormous amount of effort put into the development of new graph-based methods in the last years and the promises they make for text classification. Given our extensive experiments, we confirm that pretrained language models remain state-of-the-art in text classification despite all recent specialized advances. We argue that future work in text classification should thoroughly test against strong baselines like MLPs to properly assess the true scientific progress. The source code is available: https://github.com/drndr/multilabel-text-clf
翻译:图神经网络的流行引发了基于图的方法在单标签和多标签文本分类中的复兴。然而,尚不清楚与标准机器学习方法和现代预训练语言模型相比,这些基于图的方法是否具有优势。我们对比了词袋模型、序列模型、图模型和层级模型在文本分类任务中的丰富选择,聚合了文献中5个单标签数据集和7个多标签数据集的结果,并开展了自己的实验。研究结果明确表明,在单标签和多标签分类任务中,基于图的方法未能超越微调后的语言模型,有时甚至不如基于词袋的多层感知机(MLP)等标准机器学习方法。这质疑了过去几年在开发新型图方法上投入的巨大努力以及它们对文本分类的承诺。基于广泛的实验,我们确认预训练语言模型尽管面临近期专门技术的进步,仍保持文本分类的最先进水平。我们认为,未来文本分类研究应针对MLP等强基线方法进行充分测试,以准确评估真正的科学进步。源代码地址:https://github.com/drndr/multilabel-text-clf