As large language models engage in extended reasoning tasks, they accumulate significant state -- architectural mappings, trade-off decisions, codebase conventions -- within the context window. This understanding is lost when sessions reach context limits and undergo lossy compaction. We propose Contextual Memory Virtualisation (CMV), a system that treats accumulated LLM understanding as version-controlled state. Borrowing from operating system virtual memory, CMV models session history as a Directed Acyclic Graph (DAG) with formally defined snapshot, branch, and trim primitives that enable context reuse across independent parallel sessions. We introduce a three-pass structurally lossless trimming algorithm that preserves every user message and assistant response verbatim while reducing token counts by a mean of 20% and up to 86% for sessions with significant overhead by stripping mechanical bloat such as raw tool outputs, base64 images, and metadata. A single-user case-study evaluation across 76 real-world coding sessions demonstrates that trimming remains economically viable under prompt caching, with the strongest gains in mixed tool-use sessions, which average 39% reduction and reach break-even within 10 turns. A reference implementation is available at https://github.com/CosmoNaught/claude-code-cmv.
翻译:随着大型语言模型参与扩展推理任务,它们会在上下文窗口中积累大量状态信息——包括架构映射、权衡决策、代码库约定等。当会话达到上下文限制并经过有损压缩时,这些理解便会丢失。我们提出上下文记忆虚拟化(CMV),该系统将累积的LLM理解视为版本控制的状态。借鉴操作系统虚拟内存的思想,CMV将会话历史建模为有向无环图(DAG),并定义了形式化的快照、分支和修剪原语,使得上下文能够在独立的并行会话间复用。我们提出一种三阶段结构无损修剪算法,在完整保留每条用户消息和助手响应的同时,通过剔除机械性冗余内容(如原始工具输出、base64图像和元数据),将令牌数量平均减少20%,在存在显著开销的会话中最高可减少86%。通过对76个真实世界编码会话的单用户案例研究评估表明,在提示缓存机制下修剪仍保持经济可行性,其中混合工具使用会话的收益最为显著,平均减少39%并在10轮对话内达到盈亏平衡。参考实现可在https://github.com/CosmoNaught/claude-code-cmv获取。