Text Classification is the most essential and fundamental problem in Natural Language Processing. While numerous recent text classification models applied the sequential deep learning technique, graph neural network-based models can directly deal with complex structured text data and exploit global information. Many real text classification applications can be naturally cast into a graph, which captures words, documents, and corpus global features. In this survey, we bring the coverage of methods up to 2023, including corpus-level and document-level graph neural networks. We discuss each of these methods in detail, dealing with the graph construction mechanisms and the graph-based learning process. As well as the technological survey, we look at issues behind and future directions addressed in text classification using graph neural networks. We also cover datasets, evaluation metrics, and experiment design and present a summary of published performance on the publicly available benchmarks. Note that we present a comprehensive comparison between different techniques and identify the pros and cons of various evaluation metrics in this survey.
翻译:文本分类是自然语言处理中最基础且最重要的问题。尽管近年许多文本分类模型采用了序列深度学习技术,但基于图神经网络的模型能够直接处理复杂结构化的文本数据并利用全局信息。许多实际文本分类应用可自然地建模为图结构,该结构能捕获词汇、文档及语料库的全局特征。在本综述中,我们覆盖了截至2023年的研究方法,包括语料级与文档级图神经网络。我们详细探讨了每种方法,涉及图构建机制与基于图的学习过程。除技术综述外,我们分析了图神经网络用于文本分类时面临的潜在问题与未来方向。同时涵盖数据集、评估指标与实验设计,并总结了公开基准测试中的已有性能结果。值得注意的是,本文对不同技术进行了全面比较,并指出了各类评估指标的优缺点。