Recent years have witnessed increasing interests in prompt-based learning in which models can be trained on only a few annotated instances, making them suitable in low-resource settings. When using prompt-based learning for text classification, the goal is to use a pre-trained language model (PLM) to predict a missing token in a pre-defined template given an input text, which can be mapped to a class label. However, PLMs built on the transformer architecture tend to generate similar output embeddings, making it difficult to discriminate between different class labels. The problem is further exacerbated when dealing with classification tasks involving many fine-grained class labels. In this work, we alleviate this information diffusion issue, i.e., different tokens share a large proportion of similar information after going through stacked multiple self-attention layers in a transformer, by proposing a calibration method built on feature transformations through rotation and scaling to map a PLM-encoded embedding into a new metric space to guarantee the distinguishability of the resulting embeddings. Furthermore, we take the advantage of hyperbolic embeddings to capture the hierarchical relations among fine-grained class-associated token embedding by a coarse-to-fine metric learning strategy to enhance the distinguishability of the learned output embeddings. Extensive experiments on the three datasets under various settings demonstrate the effectiveness of our approach. Our code can be found at https://github.com/donttal/TARA.
翻译:近年来,基于提示的学习方法引起了广泛关注,这类方法仅需少量标注实例即可训练模型,适用于低资源场景。在文本分类任务中应用提示学习时,其目标是通过预训练语言模型(PLM),根据输入文本预测预定义模板中的缺失标记,并将该标记映射至类别标签。然而,基于Transformer架构的PLM倾向于生成相似的输出嵌入,导致不同类别标签难以区分。在处理涉及大量细粒度类标签的分类任务时,这一问题尤为突出。针对这一信息扩散问题(即在Transformer中,不同标记经过多层自注意力堆叠后共享大量相似信息),本文提出一种校准方法:通过旋转和缩放的特征变换,将PLM编码的嵌入映射至新的度量空间,从而确保所得嵌入的可区分性。此外,我们利用双曲嵌入的优势,通过从粗到细的度量学习策略捕获细粒度类别相关标记嵌入的层级关系,进一步增强输出嵌入的可区分性。在三种数据集及多种设置下的广泛实验验证了本方法的有效性。我们的代码可从https://github.com/donttal/TARA获取。