Batch prompting is a common technique in large language models (LLMs) used to process multiple inputs simultaneously, aiming to improve computational efficiency. However, as batch sizes increase, performance degradation often occurs due to the model's difficulty in handling lengthy context inputs. Existing methods that attempt to mitigate these issues rely solely on batch data arrangement and majority voting rather than improving the design of the batch prompt itself. In this paper, we address these limitations by proposing "Auto-Demo Prompting," a novel approach that leverages the question-output pairs from earlier questions within a batch as demonstrations for subsequent answer inference. We provide a formal theoretical analysis of how Auto-Demo Prompting functions within the autoregressive generation process of LLMs, illustrating how it utilizes prior outputs to optimize the model's internal representations. Our method effectively bridges the gap between batch prompting and few-shot prompting, enhancing performance with only a slight compromise in token usage. Experimental results across five NLP tasks demonstrate its effectiveness in mitigating performance degradation and occasionally outperforming single prompts. Furthermore, it opens new avenues for applying few-shot learning techniques, such as demonstration selection, within batch prompting, making it a robust solution for real-world applications.
翻译:批量提示是大语言模型(LLMs)中用于同时处理多个输入的常用技术,旨在提高计算效率。然而,随着批量大小的增加,由于模型难以处理冗长的上下文输入,性能下降常常发生。现有试图缓解这些问题的方法仅依赖于批量数据排列和多数投票,而非改进批量提示本身的设计。在本文中,我们通过提出“自动演示提示”来解决这些局限性,这是一种新颖的方法,利用批次中早期问题的问题-输出对作为后续答案推理的演示。我们对自动演示提示在LLMs的自回归生成过程中如何运作进行了正式的理论分析,阐明了它如何利用先前的输出来优化模型的内部表示。我们的方法有效地弥合了批量提示与少样本提示之间的差距,仅以轻微的令牌使用量妥协为代价提升了性能。在五个NLP任务上的实验结果表明,该方法在缓解性能下降方面具有有效性,偶尔甚至优于单次提示。此外,它为在批量提示中应用少样本学习技术(如演示选择)开辟了新途径,使其成为实际应用的稳健解决方案。