Handling long input contexts remains a significant challenge for Large Language Models (LLMs), particularly in resource-constrained environments such as mobile devices. Our work aims to address this limitation by introducing InfiniPot, a novel KV cache control framework designed to enable pre-trained LLMs to manage extensive sequences within fixed memory constraints efficiently, without requiring additional training. InfiniPot leverages Continual Context Distillation (CCD), an iterative process that compresses and retains essential information through novel importance metrics, effectively maintaining critical data even without access to future context. Our comprehensive evaluations indicate that InfiniPot significantly outperforms models trained for long contexts in various NLP tasks, establishing its efficacy and versatility. This work represents a substantial advancement toward making LLMs applicable to a broader range of real-world scenarios.
翻译:处理长输入上下文仍然是大语言模型面临的一项重大挑战,尤其是在移动设备等资源受限的环境中。本研究旨在通过引入InfiniPot来解决这一限制,这是一个新颖的KV缓存控制框架,旨在使预训练大语言模型能够在固定内存约束下高效管理超长序列,且无需额外训练。InfiniPot利用持续上下文蒸馏,这是一种通过新颖的重要性度量迭代压缩并保留关键信息的过程,即使无法访问未来上下文也能有效维持关键数据。我们的综合评估表明,InfiniPot在多种自然语言处理任务中显著优于专为长上下文训练的模型,证明了其有效性和通用性。这项工作代表了使大语言模型适用于更广泛现实场景的重要进展。