Scaling language models to longer contexts is essential for capturing rich dependencies across extended discourse. However, naïve context extension imposes significant computational and memory burdens, often resulting in inefficiencies during both training and inference. In this work, we propose CCF, a novel context compression framework designed to enable efficient long-context modeling by learning hierarchical latent representations that preserve global semantics while aggressively reducing input redundancy. CCF integrates segment-wise semantic aggregation with key-value memory encoding, forming compact representations that support accurate reconstruction and long-range understanding. To further enhance scalability, we introduce a training-efficient optimization strategy that couples incremental segment decoding with sparse reservoir sampling, substantially reducing memory overhead without degrading performance. Empirical results on multiple long-context language modeling benchmarks demonstrate that CCF achieves competitive perplexity under high compression ratios, and significantly improves throughput and memory efficiency compared to existing approaches. These findings highlight the potential of structured compression for scalable and effective long-context language modeling.
翻译:将语言模型扩展至更长上下文对于捕捉长距离语篇中的丰富依赖至关重要。然而,简单的上下文扩展会带来显著的计算与内存负担,常在训练和推理过程中导致效率低下。本文提出CCF,一种新颖的上下文压缩框架,旨在通过学习层次化的潜在表征来实现高效的长上下文建模,该表征在积极降低输入冗余的同时保留了全局语义。CCF将分段语义聚合与键值记忆编码相结合,形成支持精确重建和长距离理解的紧凑表征。为进一步提升可扩展性,我们引入了一种训练高效的优化策略,该策略将增量分段解码与稀疏储层采样相结合,在保持性能不下降的前提下显著降低了内存开销。在多个长上下文语言建模基准上的实验结果表明,CCF在高压缩比下实现了具有竞争力的困惑度,并且与现有方法相比,显著提升了吞吐量和内存效率。这些发现凸显了结构化压缩对于可扩展且高效的长上下文语言建模的潜力。