Text classification is a fundamental problem in information retrieval with many real-world applications, such as predicting the topics of online articles and the categories of e-commerce product descriptions. However, low-resource text classification, with no or few labeled samples, presents a serious concern for supervised learning. Meanwhile, many text data are inherently grounded on a network structure, such as a hyperlink/citation network for online articles, and a user-item purchase network for e-commerce products. These graph structures capture rich semantic relationships, which can potentially augment low-resource text classification. In this paper, we propose a novel model called Graph-Grounded Pre-training and Prompting (G2P2) to address low-resource text classification in a two-pronged approach. During pre-training, we propose three graph interaction-based contrastive strategies to jointly pre-train a graph-text model; during downstream classification, we explore handcrafted discrete prompts and continuous prompt tuning for the jointly pre-trained model to achieve zero- and few-shot classification, respectively. Besides, for generalizing continuous prompts to unseen classes, we propose conditional prompt tuning on graphs (G2P2$^*$). Extensive experiments on four real-world datasets demonstrate the strength of G2P2 in zero- and few-shot low-resource text classification tasks, and illustrate the advantage of G2P2$^*$ in dealing with unseen classes.
翻译:摘要:文本分类是信息检索中的基础问题,具有众多实际应用,例如预测在线文章的主题和电商产品描述的类别。然而,低资源文本分类(缺乏或仅有少量标注样本)对监督学习构成了严峻挑战。与此同时,许多文本数据天然依赖于网络结构,例如在线文章的超链接/引用网络和电商产品的用户-物品购买网络。这些图结构捕获了丰富的语义关系,有望增强低资源文本分类。本文提出一种名为“图基预训练与提示”(G2P2)的新模型,通过双管齐下的方法解决低资源文本分类问题。在预训练阶段,我们提出三种基于图交互的对比策略,联合预训练图-文本模型;在下游分类阶段,我们探索手工设计的离散提示和连续提示微调,分别用于联合预训练模型的零样本和少样本分类。此外,为将连续提示泛化至未见类别,我们提出基于图的条件提示微调(G2P2$^*$)。在四个真实数据集上的大量实验表明,G2P2在零样本和少样本低资源文本分类任务中的优势,并展示了G2P2$^*$在处理未见类别方面的优越性。