基于上下文语义锚点的大语言模型无自编码上下文压缩方法 (Autoencoding-Free Context Compression for LLMs via Contextual Semantic Anchors)

Context compression presents a promising approach for accelerating large language model (LLM) inference by compressing long contexts into compact representations. Current context compression methods predominantly rely on autoencoding tasks to train context-agnostic compression tokens to compress contextual semantics. While autoencoding tasks enable compression tokens to acquire compression capabilities, compression via autoencoding tasks creates a fundamental mismatch: the models are optimized for reconstruction that diverge from actual downstream tasks, thereby weakening the features more beneficial for real-world usage. We propose Semantic-Anchor Compression (SAC), a novel method that shifts from autoencoding task based compression to an architecture that is equipped with this compression capability \textit{a priori}. Instead of training models to compress contexts through autoencoding tasks, SAC directly selects so-called anchor tokens from the original context and aggregates contextual information into their key-value (KV) representations. By deriving representations directly from the contextual tokens, SAC eliminates the need for autoencoding training. To ensure compression performance while directly leveraging anchor tokens, SAC incorporates two key designs: (1) anchor embeddings that enable the compressor to identify critical tokens, and (2) bidirectional attention modification that allows anchor tokens to capture information from the entire context. Experimental results demonstrate that SAC consistently outperforms existing context compression methods across various compression ratios. On out-of-distribution evaluation using MRQA, SAC achieves 1 EM improvement at 5x compression over strong baselines, with increasing advantages at higher compression ratios.

翻译：上下文压缩通过将长上下文压缩为紧凑表示，为加速大语言模型推理提供了一种前景广阔的方法。当前上下文压缩方法主要依赖自编码任务来训练与上下文无关的压缩标记以压缩上下文语义。虽然自编码任务使压缩标记获得压缩能力，但通过自编码任务进行压缩存在根本性错配：模型被优化用于重建任务，这与实际下游任务存在偏差，从而削弱了对实际应用更有益的特征。我们提出语义锚点压缩方法，这是一种将基于自编码任务的压缩范式转变为先天具备压缩能力架构的新方法。SAC不再通过自编码任务训练模型压缩上下文，而是直接从原始上下文中选择所谓的锚点标记，并将上下文信息聚合到其键值表示中。通过直接从上下文标记推导表示，SAC消除了自编码训练的需求。为确保在直接利用锚点标记的同时保持压缩性能，SAC包含两个关键设计：(1) 使压缩器能够识别关键标记的锚点嵌入；(2) 允许锚点标记捕获整个上下文信息的双向注意力修正机制。实验结果表明，在不同压缩比下，SAC始终优于现有上下文压缩方法。在使用MRQA进行分布外评估时，SAC在5倍压缩下比强基线提升1个EM分数，且在更高压缩比下优势持续扩大。