Large language models have demonstrated surprising ability to perform in-context learning, i.e., these models can be directly applied to solve numerous downstream tasks by conditioning on a prompt constructed by a few input-output examples. However, prior research has shown that in-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats. Therefore, the construction of an appropriate prompt is essential for improving the performance of in-context learning. In this paper, we revisit this problem from the view of predictive bias. Specifically, we introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes. Then we empirically show that prompts with higher bias always lead to unsatisfactory predictive quality. Based on this observation, we propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning. We perform comprehensive experiments with state-of-the-art mainstream models such as GPT-3 on various downstream tasks. Our results indicate that our method can enhance the model's in-context learning performance in an effective and interpretable manner.
翻译:大型语言模型已展现出惊人的上下文学习能力,即通过将少量输入-输出示例构建的提示作为条件,这些模型可直接应用于解决众多下游任务。然而,先前研究表明,由于训练示例、示例顺序及提示格式的差异,上下文学习可能面临高度不稳定性。因此,构建合适的提示对于提升上下文学习性能至关重要。本文从预测偏差的角度重新审视该问题。具体而言,我们提出一种评估固定提示针对标签或特定属性的预测偏差的度量标准,并通过实证表明高偏差提示始终会导致不理想的预测质量。基于此发现,我们提出一种基于贪心搜索的新型搜索策略,以识别近似最优提示来提升上下文学习性能。我们使用GPT-3等最先进的主流模型,在多种下游任务上进行了全面实验。结果表明,我们的方法能够以有效且可解释的方式增强模型的上下文学习性能。