We investigate the role of various demonstration components in the in-context learning (ICL) performance of large language models (LLMs). Specifically, we explore the impacts of ground-truth labels, input distribution, and complementary explanations, particularly when these are altered or perturbed. We build on previous work, which offers mixed findings on how these elements influence ICL. To probe these questions, we employ explainable NLP (XNLP) methods and utilize saliency maps of contrastive demonstrations for both qualitative and quantitative analysis. Our findings reveal that flipping ground-truth labels significantly affects the saliency, though it's more noticeable in larger LLMs. Our analysis of the input distribution at a granular level reveals that changing sentiment-indicative terms in a sentiment analysis task to neutral ones does not have as substantial an impact as altering ground-truth labels. Finally, we find that the effectiveness of complementary explanations in boosting ICL performance is task-dependent, with limited benefits seen in sentiment analysis tasks compared to symbolic reasoning tasks. These insights are critical for understanding the functionality of LLMs and guiding the development of effective demonstrations, which is increasingly relevant in light of the growing use of LLMs in applications such as ChatGPT. Our research code is publicly available at https://github.com/paihengxu/XICL.
翻译:摘要:我们研究了大型语言模型(LLMs)在上下文学习(ICL)中各类示范成分的作用,具体探讨了真实标签、输入分布及补充解释的影响,尤其关注这些成分被修改或扰动时的效果。我们基于已有研究展开工作——这些研究对这些要素如何影响ICL给出了相矛盾的结论。为探究上述问题,我们采用可解释自然语言处理(XNLP)方法,并利用对比示范的显著性图进行定性与定量分析。研究发现:翻转真实标签会显著影响显著性,且这种影响在大模型中更为明显;对输入分布的细粒度分析表明,在情感分析任务中将情感指示词替换为中性词带来的影响远不及修改真实标签;此外,补充解释对提升ICL性能的有效性依赖于具体任务,在情感分析任务中的收益远低于符号推理任务。这些发现对于理解LLMs的功能机制及其有效示范构建具有关键意义,尤其考虑到以ChatGPT为代表的LLMs应用日益普及。本研究的代码已开源发布于https://github.com/paihengxu/XICL。