As a novel and effective fine-tuning paradigm based on large-scale pre-trained language models (PLMs), prompt-tuning aims to reduce the gap between downstream tasks and pre-training objectives. While prompt-tuning has yielded continuous advancements in various tasks, such an approach still remains a persistent defect: prompt-tuning methods fail to generalize to specific few-shot patterns. From the perspective of distribution analyses, we disclose that the intrinsic issues behind the phenomenon are the over-multitudinous conceptual knowledge contained in PLMs and the abridged knowledge for target downstream domains, which jointly result in that PLMs mis-locate the knowledge distributions corresponding to the target domains in the universal knowledge embedding space. To this end, we intuitively explore to approximate the unabridged target domains of downstream tasks in a debiased manner, and then abstract such domains to generate discriminative prompts, thereby providing the de-ambiguous guidance for PLMs. Guided by such an intuition, we propose a simple yet effective approach, namely BayesPrompt, to learn prompts that contain the domain discriminative information against the interference from domain-irrelevant knowledge. BayesPrompt primitively leverages known distributions to approximate the debiased factual distributions of target domains and further uniformly samples certain representative features from the approximated distributions to generate the ultimate prompts for PLMs. We provide theoretical insights with the connection to domain adaptation. Empirically, our method achieves state-of-the-art performance on benchmarks.
翻译:摘要:作为一种基于大规模预训练语言模型(PLMs)的新型且有效的微调范式,提示调优旨在缩小下游任务与预训练目标之间的差距。尽管提示调优在各种任务中取得了持续进展,但这类方法仍存在一个持久缺陷:提示调优方法无法泛化到特定的少样本模式。从分布分析的角度,我们揭示了该现象背后的本质问题在于PLMs中包含过度冗余的概念知识以及针对目标下游领域的知识缺失,这共同导致PLMs在通用知识嵌入空间中错误定位了与目标领域对应的知识分布。为此,我们直观地探索以去偏方式近似下游任务的完整目标领域,然后抽象这些领域以生成判别性提示,从而为PLMs提供无歧义的引导。基于这一直觉,我们提出了一种简单而有效的方法,即BayesPrompt,用于学习包含领域判别信息的提示,以抵制领域无关知识的干扰。BayesPrompt初步利用已知分布来近似目标领域的去偏事实分布,进一步从近似分布中均匀采样某些代表性特征,以生成最终供PLMs使用的提示。我们提供了与领域自适应相关的理论见解。实验表明,我们的方法在基准测试中达到了最先进的性能。