Recently, various pre-trained language models (PLMs) have been proposed to prove their impressive performances on a wide range of few-shot tasks. However, limited by the unstructured prior knowledge in PLMs, it is difficult to maintain consistent performance on complex structured scenarios, such as hierarchical text classification (HTC), especially when the downstream data is extremely scarce. The main challenge is how to transfer the unstructured semantic space in PLMs to the downstream domain hierarchy. Unlike previous work on HTC which directly performs multi-label classification or uses graph neural network (GNN) to inject label hierarchy, in this work, we study the HTC problem under a few-shot setting to adapt knowledge in PLMs from an unstructured manner to the downstream hierarchy. Technically, we design a simple yet effective method named Hierarchical Iterative Conditional Random Field (HierICRF) to search the most domain-challenging directions and exquisitely crafts domain-hierarchy adaptation as a hierarchical iterative language modeling problem, and then it encourages the model to make hierarchical consistency self-correction during the inference, thereby achieving knowledge transfer with hierarchical consistency preservation. We perform HierICRF on various architectures, and extensive experiments on two popular HTC datasets demonstrate that prompt with HierICRF significantly boosts the few-shot HTC performance with an average Micro-F1 by 28.80% to 1.50% and Macro-F1 by 36.29% to 1.5% over the previous state-of-the-art (SOTA) baselines under few-shot settings, while remaining SOTA hierarchical consistency performance.
翻译:近年来,各种预训练语言模型(PLMs)在广泛的少样本任务上展现出卓越性能。然而,受限于PLMs中非结构化的先验知识,在复杂结构化场景(如层级文本分类)中难以保持稳定的性能表现,尤其当下游数据极度稀缺时。核心挑战在于如何将PLMs中的非结构化语义空间迁移至下游领域层级结构。不同于以往直接进行多标签分类或使用图神经网络(GNN)注入标签层级的HTC研究,本文在少样本设定下探索HTC问题,旨在将PLMs中的知识从非结构化形式自适应至下游层级结构。技术上,我们设计了一种简洁而有效的层级迭代条件随机场方法(HierICRF),通过搜索最具领域挑战性的方向,将领域-层级自适应精妙地构建为层级迭代语言建模问题,进而在推理过程中促使模型进行层级一致性自校正,从而实现具有层级一致性保持的知识迁移。我们在多种架构上实施HierICRF,并在两个主流HTC数据集上的大量实验表明:在少样本设定下,结合HierICRF的提示方法相较于先前最优基准模型,平均Micro-F1提升28.80%至1.50%,Macro-F1提升36.29%至1.5%,同时保持最优的层级一致性性能。