Text classification is a fundamental problem in information retrieval with many real-world applications, such as predicting the topics of online articles and the categories of e-commerce product descriptions. However, low-resource text classification, with no or few labeled samples, presents a serious concern for supervised learning. Meanwhile, many text data are inherently grounded on a network structure, such as a hyperlink/citation network for online articles, and a user-item purchase network for e-commerce products. These graph structures capture rich semantic relationships, which can potentially augment low-resource text classification. In this paper, we propose a novel model called Graph-Grounded Pre-training and Prompting (G2P2) to address low-resource text classification in a two-pronged approach. During pre-training, we propose three graph interaction-based contrastive strategies to jointly pre-train a graph-text model; during downstream classification, we explore handcrafted discrete prompts and continuous prompt tuning for the jointly pre-trained model to achieve zero- and few-shot classification, respectively. Besides, for generalizing continuous prompts to unseen classes, we propose conditional prompt tuning on graphs (G2P2$^*$). Extensive experiments on four real-world datasets demonstrate the strength of G2P2 in zero- and few-shot low-resource text classification tasks, and illustrate the advantage of G2P2$^*$ in dealing with unseen classes.
翻译:文本分类是信息检索中的一个基础问题,具有许多实际应用,例如预测在线文章的主题和电子商务产品描述的类别。然而,低资源文本分类(没有或只有少量标记样本)对有监督学习构成严重挑战。与此同时,许多文本数据天然依赖于网络结构,例如在线文章的超链接/引文网络和电子商务产品的用户-物品购买网络。这些图结构捕获了丰富的语义关系,有可能增强低资源文本分类。在本文中,我们提出了一种称为"图基础预训练与提示(G2P2)"的新型模型,通过双管齐下的方法解决低资源文本分类问题。在预训练阶段,我们提出了三种基于图交互的对比策略,以联合预训练图-文本模型;在下游分类阶段,我们分别探索了手工设计的离散提示和连续提示微调,以实现零样本和少样本分类。此外,为了将连续提示泛化到未见类别,我们提出了基于图的条件提示微调(G2P2$^*$)。在四个真实世界数据集上的大量实验证明了G2P2在零样本和少样本低资源文本分类任务中的优势,并展示了G2P2$^*$在处理未见类别方面的优越性。