Text Classification is the most essential and fundamental problem in Natural Language Processing. While numerous recent text classification models applied the sequential deep learning technique, graph neural network-based models can directly deal with complex structured text data and exploit global information. Many real text classification applications can be naturally cast into a graph, which captures words, documents, and corpus global features. In this survey, we bring the coverage of methods up to 2023, including corpus-level and document-level graph neural networks. We discuss each of these methods in detail, dealing with the graph construction mechanisms and the graph-based learning process. As well as the technological survey, we look at issues behind and future directions addressed in text classification using graph neural networks. We also cover datasets, evaluation metrics, and experiment design and present a summary of published performance on the publicly available benchmarks. Note that we present a comprehensive comparison between different techniques and identify the pros and cons of various evaluation metrics in this survey.
翻译:文本分类是自然语言处理中最基础且核心的问题。尽管近年来许多文本分类模型采用了序列深度学习技术,但基于图神经网络的模型能够直接处理复杂的结构化文本数据并利用全局信息。许多实际文本分类应用可以自然地建模为图结构,从而捕捉词汇、文档和语料库的全局特征。在本综述中,我们涵盖了截至2023年的方法,包括语料级和文档级图神经网络。我们详细讨论了每种方法,涉及图构建机制和基于图的学习过程。除技术综述外,我们还探讨了文本分类中图神经网络背后的问题及未来方向。此外,我们涵盖了数据集、评估指标和实验设计,并总结了公开基准上的已有性能。需要注意的是,我们在本综述中对不同技术进行了全面比较,并指出了各种评估指标的优缺点。