Prompt-based classification adapts tasks to a cloze question format utilizing the [MASK] token and the filled tokens are then mapped to labels through pre-defined verbalizers. Recent studies have explored the use of verbalizer embeddings to reduce labor in this process. However, all existing studies require a tuning process for either the pre-trained models or additional trainable embeddings. Meanwhile, the distance between high-dimensional verbalizer embeddings should not be measured by Euclidean distance due to the potential for non-linear manifolds in the representation space. In this study, we propose a tuning-free manifold-based space re-embedding method called Locally Linear Embedding with Intra-class Neighborhood Constraint (LLE-INC) for verbalizer embeddings, which preserves local properties within the same class as guidance for classification. Experimental results indicate that even without tuning any parameters, our LLE-INC is on par with automated verbalizers with parameter tuning. And with the parameter updating, our approach further enhances prompt-based tuning by up to 3.2%. Furthermore, experiments with the LLaMA-7B&13B indicate that LLE-INC is an efficient tuning-free classification approach for the hyper-scale language models.
翻译:提示分类将任务转化为完形填空形式,利用[MASK]标记,并将填充后的标记通过预定义的语言映射到标签。近年来,已有研究探索使用语言嵌入来减少这一过程中的人工工作量。然而,现有研究均需要针对预训练模型或额外可训练嵌入进行调优。同时,由于表示空间中可能存在非线性流形,高维语言嵌入之间的距离不应使用欧氏距离进行度量。在本研究中,我们提出了一种免调优的基于流形的空间重嵌入方法——局部线性嵌入与类内邻域约束(LLE-INC),用于语言嵌入,它通过保留同一类内的局部属性作为分类的指导。实验结果表明,即使不进行任何参数调优,我们的LLE-INC也能达到与自动调优语言嵌入相当的性能。而在参数更新后,我们的方法进一步提升了提示调优的效果,最高提升达3.2%。此外,在LLaMA-7B和13B上的实验表明,LLE-INC是一种适用于超大规模语言模型的高效免调优分类方法。