We characterize and study zero-shot abstractive summarization in Large Language Models (LLMs) by measuring position bias, which we propose as a general formulation of the more restrictive lead bias phenomenon studied previously in the literature. Position bias captures the tendency of a model unfairly prioritizing information from certain parts of the input text over others, leading to undesirable behavior. Through numerous experiments on four diverse real-world datasets, we study position bias in multiple LLM models such as GPT 3.5-Turbo, Llama-2, and Dolly-v2, as well as state-of-the-art pretrained encoder-decoder abstractive summarization models such as Pegasus and BART. Our findings lead to novel insights and discussion on performance and position bias of models for zero-shot summarization tasks.
翻译:我们通过衡量位置偏差来表征并研究大语言模型中的零样本抽象式摘要——位置偏差被提出作为文献中此前研究的更为受限的导语偏差现象的通用表述。位置偏差刻画了模型不公平地优先处理输入文本中特定部分信息而非其他部分信息的倾向,从而导致非理想行为。通过在四个多样化真实世界数据集上开展大量实验,我们研究了GPT 3.5-Turbo、Llama-2和Dolly-v2等多款大语言模型,以及Pegasus和BART等最先进的预训练编码器-解码器抽象式摘要模型中的位置偏差。我们的发现为模型在零样本摘要任务中的性能与位置偏差提供了新颖见解与讨论。