Large Language Models (LLMs) have transformed NLP with their remarkable In-context Learning (ICL) capabilities. Automated assistants based on LLMs are gaining popularity; however, adapting them to novel tasks is still challenging. While colossal models excel in zero-shot performance, their computational demands limit widespread use, and smaller language models struggle without context. This paper investigates whether LLMs can generalize from labeled examples of predefined tasks to novel tasks. Drawing inspiration from biological neurons and the mechanistic interpretation of the Transformer architecture, we explore the potential for information sharing across tasks. We design a cross-task prompting setup with three LLMs and show that LLMs achieve significant performance improvements despite no examples from the target task in the context. Cross-task prompting leads to a remarkable performance boost of 107% for LLaMA-2 7B, 18.6% for LLaMA-2 13B, and 3.2% for GPT 3.5 on average over zero-shot prompting, and performs comparable to standard in-context learning. The effectiveness of generating pseudo-labels for in-task examples is demonstrated, and our analyses reveal a strong correlation between the effect of cross-task examples and model activation similarities in source and target input tokens. This paper offers a first-of-its-kind exploration of LLMs' ability to solve novel tasks based on contextual signals from different task examples.
翻译:大型语言模型凭借其卓越的上下文学习能力彻底改变了自然语言处理领域。基于大型语言模型的自动化助手日益普及,然而使其适应新任务仍然充满挑战。虽然巨型模型在零样本任务中表现卓越,但其计算需求限制了广泛应用,而较小语言模型缺乏上下文时则表现不佳。本文探讨了大型语言模型能否从预定义任务的标注样本中泛化到全新任务。受生物神经元机制与Transformer架构可解释性启发,我们研究了跨任务信息共享的潜力。我们设计了三种语言模型的跨任务提示设置,实验表明,即使上下文不包含目标任务样本,模型仍能获得显著性能提升。与零样本提示相比,跨任务提示使LLaMA-2 7B平均性能提升107%、LLaMA-2 13B提升18.6%、GPT 3.5提升3.2%,其表现与标准上下文学习相当。我们验证了为任务内样本生成伪标签的有效性,并发现跨任务样本的效果与源目标输入标记的模型激活相似性存在强相关性。本文首次系统探究了大型语言模型基于不同任务样本的上下文信号解决新任务的能力。