In the In-Context Learning (ICL) setup, various forms of label biases can manifest. One such manifestation is majority label bias, which arises when the distribution of labeled examples in the in-context samples is skewed towards one or more specific classes making Large Language Models (LLMs) more prone to predict those labels. Such discrepancies can arise from various factors, including logistical constraints, inherent biases in data collection methods, limited access to diverse data sources, etc. which are unavoidable in a real-world industry setup. In this work, we study the robustness of in-context learning in LLMs to shifts that occur due to majority label bias within the purview of text classification tasks. Prior works have shown that in-context learning with LLMs is susceptible to such biases. In our study, we go one level deeper and show that the robustness boundary varies widely for different models and tasks, with certain LLMs being highly robust (~90%) to majority label bias. Additionally, our findings also highlight the impact of model size and the richness of instructional prompts contributing towards model robustness. We restrict our study to only publicly available open-source models to ensure transparency and reproducibility.
翻译:摘要:在上下文学习(ICL)框架下,多种形式的标签偏差可能显现。其中一种表现是多数标签偏差,即当上下文样本中带标签示例的分布倾向于某个或多个特定类别时,大语言模型(LLMs)会更容易预测这些标签。这种偏差可能由多种因素引起,包括实际限制、数据采集方法中的固有偏差、多样化数据源的获取受限等,这些因素在真实工业场景中难以避免。本研究探讨了文本分类任务中,因多数标签偏差导致分布偏移时,大语言模型在上下文学习中的鲁棒性。已有研究表明,基于LLMs的上下文学习容易受到此类偏差的影响。本研究进一步深入分析,发现不同模型和任务的鲁棒性边界差异显著,部分LLMs对多数标签偏差具有高度鲁棒性(约90%)。此外,我们的发现还揭示了模型规模与指令提示丰富度对模型鲁棒性的影响。为确保透明性和可复现性,本研究仅使用公开开源模型。