Large Language Models (LLMs) have demonstrated remarkable capabilities in comprehending and analyzing lengthy sequential inputs, owing to their extensive context windows that allow processing millions of tokens in a single forward pass. However, this paper uncovers a surprising limitation: LLMs fall short when handling long input sequences. We investigate this issue using three datasets and two tasks (sentiment analysis and news categorization) across various LLMs, including Claude 3, Gemini Pro, GPT 3.5 Turbo, Llama 3 Instruct, and Mistral Instruct models. To address this limitation, we propose and evaluate ad-hoc solutions that substantially enhance LLMs' performance on long input sequences by up to 50%, while reducing API cost and latency by up to 93% and 50%, respectively.
翻译:大语言模型凭借其广阔的上下文窗口,能够在单次前向传播中处理数百万个标记,在理解和分析长序列输入方面展现出卓越能力。然而,本文揭示了一个令人惊讶的局限性:大语言模型在处理长输入序列时表现欠佳。我们通过三个数据集和两项任务(情感分析与新闻分类),在包括Claude 3、Gemini Pro、GPT 3.5 Turbo、Llama 3 Instruct和Mistral Instruct在内的多种大语言模型上研究了这一问题。为应对此局限,我们提出并评估了定制化解决方案,这些方案将大语言模型在长输入序列上的性能提升高达50%,同时将API成本与延迟分别降低达93%和50%。