Prompt-based classification adapts tasks to a cloze question format utilizing the [MASK] token and the filled tokens are then mapped to labels through pre-defined verbalizers. Recent studies have explored the use of verbalizer embeddings to reduce labor in this process. However, all existing studies require a tuning process for either the pre-trained models or additional trainable embeddings. Meanwhile, the distance between high-dimensional verbalizer embeddings should not be measured by Euclidean distance due to the potential for non-linear manifolds in the representation space. In this study, we propose a tuning-free manifold-based space re-embedding method called Locally Linear Embedding with Intra-class Neighborhood Constraint (LLE-INC) for verbalizer embeddings, which preserves local properties within the same class as guidance for classification. Experimental results indicate that even without tuning any parameters, our LLE-INC is on par with automated verbalizers with parameter tuning. And with the parameter updating, our approach further enhances prompt-based tuning by up to 3.2%. Furthermore, experiments with the LLaMA-7B&13B indicate that LLE-INC is an efficient tuning-free classification approach for the hyper-scale language models.
翻译:提示分类通过[MASK]标记将任务适配为完形填空格式,并通过预定义的Verbalizer将填充的词映射到标签。近期研究探索了使用Verbalizer嵌入来减少该过程的人工成本。然而,现有研究都需要对预训练模型或额外可训练嵌入进行调优。同时,由于表示空间中可能存在非线性流形,高维Verbalizer嵌入之间的距离不应使用欧氏距离度量。本研究提出一种免调优的流形空间重嵌入方法——具有类内邻域约束的局部线性嵌入(LLE-INC),该方法为Verbalizer嵌入保留同一类内的局部属性作为分类指导。实验结果表明,即使不调优任何参数,我们的LLE-INC也能达到与参数调优的自动Verbalizer相当的性能。在参数更新的情况下,该方法进一步将提示调优效果提升高达3.2%。此外,在LLaMA-7B&13B上的实验表明,LLE-INC是超大规模语言模型的一种高效免调优分类方法。