Text Classification is the most essential and fundamental problem in Natural Language Processing. While numerous recent text classification models applied the sequential deep learning technique, graph neural network-based models can directly deal with complex structured text data and exploit global information. Many real text classification applications can be naturally cast into a graph, which captures words, documents, and corpus global features. In this survey, we bring the coverage of methods up to 2023, including corpus-level and document-level graph neural networks. We discuss each of these methods in detail, dealing with the graph construction mechanisms and the graph-based learning process. As well as the technological survey, we look at issues behind and future directions addressed in text classification using graph neural networks. We also cover datasets, evaluation metrics, and experiment design and present a summary of published performance on the publicly available benchmarks. Note that we present a comprehensive comparison between different techniques and identify the pros and cons of various evaluation metrics in this survey.
翻译:文本分类是自然语言处理中最基本且核心的问题。尽管近年来众多文本分类模型采用了序列深度学习技术,但基于图神经网络的模型能够直接处理复杂的结构化文本数据并利用全局信息。许多实际文本分类应用可自然地建模为图结构,该结构能够捕捉词汇、文档及语料库的全局特征。本综述涵盖了截至2023年的方法进展,包括语料库级与文档级图神经网络。我们详细探讨了各类方法,涉及图构建机制与基于图的学习过程。除技术综述外,我们还审视了图神经网络文本分类背后的关键问题与未来研究方向,同时涵盖数据集、评估指标及实验设计,并对公开基准测试中已发表的性能表现进行总结。需要特别说明的是,本综述对不同技术进行了全面比较,并辨析了各类评估指标的优劣。