Text classification plays an important role in various downstream text-related tasks, such as sentiment analysis, fake news detection, and public opinion analysis. Recently, text classification based on Graph Neural Networks (GNNs) has made significant progress due to their strong capabilities of structural relationship learning. However, these approaches still face two major limitations. First, these approaches fail to fully consider the diverse structural information across word pairs, e.g., co-occurrence, syntax, and semantics. Furthermore, they neglect sequence information in the text graph structure information learning module and can not classify texts with new words and relations. In this paper, we propose a Novel Graph-Sequence Learning Model for Inductive Text Classification (TextGSL) to address the previously mentioned issues. More specifically, we construct a single text-level graph for all words in each text and establish different edge types based on the diverse relationships between word pairs. Building upon this, we design an adaptive multi-edge message-passing paradigm to aggregate diverse structural information between word pairs. Additionally, sequential information among text data can be captured by the proposed TextGSL through the incorporation of Transformer layers. Therefore, TextGSL can learn more discriminative text representations. TextGSL has been comprehensively compared with several strong baselines. The experimental results on diverse benchmarking datasets demonstrate that TextGSL outperforms these baselines in terms of accuracy.
翻译:文本分类在诸多下游文本相关任务中扮演着重要角色,例如情感分析、虚假新闻检测与舆情分析。近年来,基于图神经网络(GNNs)的文本分类方法因其强大的结构关系学习能力取得了显著进展。然而,这些方法仍面临两大主要局限。首先,这些方法未能充分考虑词对之间多样化的结构信息,例如共现关系、句法关系与语义关系。此外,它们在文本图结构信息学习模块中忽略了序列信息,且无法对包含新词与新关系的文本进行分类。本文提出一种用于归纳式文本分类的新型图序列学习模型(TextGSL)以解决上述问题。具体而言,我们为每个文本中的所有词汇构建单一文本级图,并依据词对间的多样化关系建立不同的边类型。在此基础上,我们设计了一种自适应多边消息传递范式,以聚合词对间多样化的结构信息。此外,所提出的TextGSL通过引入Transformer层能够捕捉文本数据中的序列信息。因此,TextGSL能够学习更具判别性的文本表示。TextGSL已与多个强基线模型进行了全面比较。在多样化基准数据集上的实验结果表明,TextGSL在准确率方面优于这些基线模型。