The popularity of graph neural networks has triggered a resurgence of graph-based methods for single-label and multi-label text classification. However, it is unclear whether these graph-based methods are beneficial compared to standard machine learning methods and modern pretrained language models. We compare a rich selection of bag-of-words, sequence-based, graph-based, and hierarchical methods for text classification. We aggregate results from the literature over 5 single-label and 7 multi-label datasets and run our own experiments. Our findings unambiguously demonstrate that for single-label and multi-label classification tasks, the graph-based methods fail to outperform fine-tuned language models and sometimes even perform worse than standard machine learning methods like multilayer perceptron (MLP) on a bag-of-words. This questions the enormous amount of effort put into the development of new graph-based methods in the last years and the promises they make for text classification. Given our extensive experiments, we confirm that pretrained language models remain state-of-the-art in text classification despite all recent specialized advances. We argue that future work in text classification should thoroughly test against strong baselines like MLPs to properly assess the true scientific progress. The source code is available: https://github.com/drndr/multilabel-text-clf
翻译:图神经网络的流行引发了基于图的方法在单标签和多标签文本分类中的复兴。然而,目前尚不清楚这些基于图的方法相较于标准机器学习方法和现代预训练语言模型是否具有优势。我们比较了词袋、序列、图和层次方法在文本分类中的丰富选择。我们汇总了文献中基于5个单标签和7个多标签数据集的结果,并进行了自主实验。我们的发现明确表明:对于单标签和多标签分类任务,基于图的方法未能超越微调后的语言模型,有时甚至不如基于词袋的标准机器学习方法(如多层感知器)。这让人质疑近年来在开发新图方法上投入的巨大精力及其在文本分类中所作的承诺。基于大量实验,我们确认尽管近期出现了许多专门进展,预训练语言模型在文本分类中仍保持最先进水平。我们认为,未来文本分类的研究应当针对强基线方法(如MLP)进行严格对比,以正确评估真正的科学进展。源代码地址:https://github.com/drndr/multilabel-text-clf