Long-context inputs in large language models (LLMs) often suffer from the "lost in the middle" problem, where critical information becomes diluted or ignored due to excessive length. Context compression methods aim to address this by reducing input size, but existing approaches struggle with balancing information preservation and compression efficiency. We propose Adaptive Task-Aware Compressor (ATACompressor), which dynamically adjusts compression based on the specific requirements of the task. ATACompressor employs a selective encoder that compresses only the task-relevant portions of long contexts, ensuring that essential information is preserved while reducing unnecessary content. Its adaptive allocation controller perceives the length of relevant content and adjusts the compression rate accordingly, optimizing resource utilization. We evaluate ATACompressor on three QA datasets: HotpotQA, MSMARCO, and SQUAD-showing that it outperforms existing methods in terms of both compression efficiency and task performance. Our approach provides a scalable solution for long-context processing in LLMs. Furthermore, we perform a range of ablation studies and analysis experiments to gain deeper insights into the key components of ATACompressor.
翻译:大型语言模型(LLMs)中的长上下文输入常面临“中间信息丢失”问题,即关键信息因文本过长而被稀释或忽略。上下文压缩方法旨在通过减小输入规模来解决此问题,但现有方法难以在信息保留与压缩效率之间取得平衡。本文提出自适应任务感知压缩器(ATACompressor),它能够根据任务的具体需求动态调整压缩策略。ATACompressor采用选择性编码器,仅对长上下文中与任务相关的部分进行压缩,在保留关键信息的同时减少冗余内容。其自适应分配控制器能够感知相关内容的长度并据此调整压缩率,从而优化资源利用率。我们在三个问答数据集(HotpotQA、MSMARCO和SQUAD)上评估ATACompressor,结果表明其在压缩效率与任务性能方面均优于现有方法。本方法为LLMs的长上下文处理提供了可扩展的解决方案。此外,我们通过一系列消融实验与分析实验,对ATACompressor的关键组件进行了深入探究。