Recent years have witnessed increasing interests in prompt-based learning in which models can be trained on only a few annotated instances, making them suitable in low-resource settings. When using prompt-based learning for text classification, the goal is to use a pre-trained language model (PLM) to predict a missing token in a pre-defined template given an input text, which can be mapped to a class label. However, PLMs built on the transformer architecture tend to generate similar output embeddings, making it difficult to discriminate between different class labels. The problem is further exacerbated when dealing with classification tasks involving many fine-grained class labels. In this work, we alleviate this information diffusion issue, i.e., different tokens share a large proportion of similar information after going through stacked multiple self-attention layers in a transformer, by proposing a calibration method built on feature transformations through rotation and scaling to map a PLM-encoded embedding into a new metric space to guarantee the distinguishability of the resulting embeddings. Furthermore, we take the advantage of hyperbolic embeddings to capture the hierarchical relations among fine-grained class-associated token embedding by a coarse-to-fine metric learning strategy to enhance the distinguishability of the learned output embeddings. Extensive experiments on the three datasets under various settings demonstrate the effectiveness of our approach. Our code can be found at https://github.com/donttal/TARA.
翻译:近年来,基于提示的学习方法受到越来越多的关注,这类方法仅需少量标注实例即可训练模型,因而适用于低资源场景。在文本分类任务中应用基于提示的学习时,目标是利用预训练语言模型根据输入文本预测预定义模板中的缺失标记,进而将其映射为类别标签。然而,基于Transformer架构的预训练语言模型倾向于生成相似的输出嵌入,导致难以区分不同类别标签。当涉及包含大量细粒度标签的分类任务时,这一问题更为突出。本文通过提出一种基于旋转和缩放的特征变换校准方法,将预训练语言模型编码的嵌入映射至新的度量空间,以保证所得嵌入的区分度,从而缓解信息扩散问题(即不同标记经过Transformer中多层自注意力层堆叠后共享大量相似信息)。此外,我们利用双曲嵌入的优势,通过从粗到细的度量学习策略捕获细粒度类别相关标记嵌入间的层级关系,以增强所学输出嵌入的区分度。在三种数据集、多种设置下的广泛实验验证了我们方法的有效性。我们的代码可在https://github.com/donttal/TARA获取。