User as Code: Executable Memory for Personalized Agents

A personalized AI agent needs a user memory: a persistent model of who the user is, built across many conversations and consulted on each new one. Today this memory is almost always stored as unstructured text, a knowledge graph, or a flat store of facts, and consulted by retrieval -- fetching the entries most similar to the current request. Such "bag-of-facts" memory recalls individual facts well, but because storing a fact and acting on it are separate steps, it struggles to resolve contradictions, aggregate over many records, or enforce rules. We argue that user memory should instead be executable. We introduce User as Code (UaC), a paradigm in which an agent's model of a user is a living software project: typed Python objects hold the user's state and ordinary Python functions encode the rules that govern it, so representing and reasoning about the user happen in one medium an interpreter can run. The enabling mechanism is a two-phase pipeline: an append-only log that never discards a fact, periodically checkpointed into typed code. This changes what memory can do. On standard long-term conversation benchmarks, UaC matches both a full-context upper bound and the strongest prior memory systems on recall (78.8% on LOCOMO). Its advantage emerges where representation matters most. On aggregate questions over a user's history -- "how many international trips did I take last year?" -- retrieval-based memory collapses (6-43%) while UaC stays near-perfect (99%), because the answer is a one-line computation over typed state rather than a search over text. And because its rules execute deterministically whenever the state changes, UaC can surface unsolicited, safety-critical alerts -- such as a newly prescribed drug that conflicts with an allergy recorded months earlier -- a capability query-driven memory cannot provide.

翻译：个性化AI智能体需要用户记忆：跨越多次对话构建的用户持久化模型，并在每次新对话中参考该模型。当前这类记忆几乎全部以非结构化文本、知识图谱或扁平化事实库形式存储，并通过检索方式调用——即提取与当前请求最相似的条目。这种"事实袋"式记忆能良好地回忆单个事实，但由于存储事实与执行操作是分离步骤，其在解决矛盾、聚合多条记录或强制执行规则方面存在困难。我们认为用户记忆应当是可执行的。我们提出"用户即代码"（User as Code, UaC）范式，其中智能体对用户的建模是一个活体软件项目：类型化Python对象存储用户状态，普通Python函数编码管理状态的规则，使得用户表征与推理可在解释器运行的统一媒介中完成。其实现机制是双阶段流水线：持续追加的日志永不丢弃事实，并周期性地检查点转化为类型化代码。这彻底改变了记忆的能力边界。在标准长程对话基准测试中，UaC在召回率（LOCOMO数据集达78.8%）方面既达到全上下文上界，也持平最强先验记忆系统。其优势在表征至关重要的场景尤为突出。针对用户历史聚合类问题——例如"去年我共进行了多少次国际旅行？"——基于检索的记忆性能骤降（6-43%），而UaC保持近乎完美（99%），因为答案是对类型化状态的一行计算而非文本搜索。此外，由于其规则在状态变更时确定性执行，UaC能主动触发安全关键警报——例如新开具的药物与数月前记录的食物过敏相冲突——这种能力是查询驱动型记忆无法提供的。