Recent years have witnessed increasing interests in prompt-based learning in which models can be trained on only a few annotated instances, making them suitable in low-resource settings. When using prompt-based learning for text classification, the goal is to use a pre-trained language model (PLM) to predict a missing token in a pre-defined template given an input text, which can be mapped to a class label. However, PLMs built on the transformer architecture tend to generate similar output embeddings, making it difficult to discriminate between different class labels. The problem is further exacerbated when dealing with classification tasks involving many fine-grained class labels. In this work, we alleviate this information diffusion issue, i.e., different tokens share a large proportion of similar information after going through stacked multiple self-attention layers in a transformer, by proposing a calibration method built on feature transformations through rotation and scaling to map a PLM-encoded embedding into a new metric space to guarantee the distinguishability of the resulting embeddings. Furthermore, we take the advantage of hyperbolic embeddings to capture the hierarchical relations among fine-grained class-associated token embedding by a coarse-to-fine metric learning strategy to enhance the distinguishability of the learned output embeddings. Extensive experiments on the three datasets under various settings demonstrate the effectiveness of our approach. Our code can be found at https://github.com/donttal/TARA.
翻译:近年来,基于提示的学习方法引起了越来越多的关注,这类方法仅需少量标注实例即可训练模型,因此适用于低资源场景。在将提示学习应用于文本分类时,目标是利用预训练语言模型(PLM)根据输入文本预测预定义模板中的缺失标记,进而映射为类别标签。然而,基于Transformer架构的PLM倾向于生成相似的输出嵌入,这使得不同类别标签的区分变得困难。当处理涉及大量细粒度类别标签的分类任务时,这一问题尤为突出。在本工作中,我们通过提出一种基于特征变换(旋转与缩放)的校准方法,将PLM编码的嵌入映射到新的度量空间以确保所得嵌入的可区分性,从而缓解信息扩散问题(即不同标记在经过Transformer中堆叠的多层自注意力层后共享大量相似信息)。此外,我们利用双曲嵌入的优势,通过从粗到细的度量学习策略捕获细粒度类别相关标记嵌入之间的层次关系,以增强所学输出嵌入的可区分性。在三个数据集上的多种设置下进行的广泛实验证明了我们方法的有效性。我们的代码可在https://github.com/donttal/TARA获取。