If an AI agent makes decisions on a person's behalf, those decisions must align with its user. We introduce representational accuracy to measure how faithfully a system captures a person's interpretation. An interpretive layer is operationalized as a Behavioral Specification. Our reference implementation aggressively compresses a person's data into interpretive patterns, served as context to a language model. We evaluate the Specification on a prototype benchmark of held-out behavioral predictions scored by a calibrated 5-judge LLM panel. We test it independently and in composition with a range of context conditions: full raw corpus, full extracted facts, and four commercial memory systems (Mem0, Letta, Supermemory, Zep). Across 14 public-domain autobiographical corpora, the Specification lifts representational accuracy in aggregate and nearly eliminates model hedging. It recovers most of what the raw corpus delivers, at ~25x less context cost. The Specification lifts subjects toward a common predictive level regardless of pretraining baseline; the lift in absolute points is therefore largest where the baseline is lowest, suggesting the population of relevance is anyone not adequately represented in pretraining. Lift is greatest on interpretation-required questions, where providing an interpretive layer enables model behavior that extracted facts or raw corpus do not. Conversely, on recall-required questions, this layer can interfere rather than help. We conclude that representational accuracy is distinct from recall and that human-AI alignment is dependent on how accurately the user is represented. Representational accuracy makes that alignment testable.
翻译:若AI代理代表个体做决策,这些决策必须与其用户保持一致。我们引入表征准确性来衡量系统捕捉个体解释的忠实程度。解释层通过行为规范可操作化实现。我们的参考实现将个体数据激进压缩为解释模式,作为语言模型的上下文。我们在一个由校准的5评委LLM小组评分的行为预测预留基准上进行评估。独立测试及与多种上下文条件组合测试:完整原始语料库、完整提取事实、以及四个商业记忆系统(Mem0、Letta、Supermemory、Zep)。跨越14个公共领域自传语料库,该规范在总体上提升表征准确性,并几乎消除模型规避。它以约25倍低的上下文成本恢复原始语料库的大部分效果。该规范将受试者提升至共同的预测水平,无论预训练基线如何;因此,绝对点的提升在基线最低处最大,表明相关人群是任何在预训练中未被充分代表的人。在需要解释的问题上提升最大,提供解释层可实现提取事实或原始语料库无法激发的模型行为。反之,在需要回忆的问题上,此层可能产生干扰而非帮助。我们得出结论:表征准确性不同于回忆,且人机一致性取决于用户被表征的准确程度。表征准确性使这种一致性可测试。