Text files such as skill files, memory files, and behavioral configuration files play a central role in defining how modern agents act. Through edits by humans or the agents themselves, these files may evolve over time, directly steering the agent's behavior in future interactions. We present a methodology and framework for measuring agent $traits$ by defining traits as directions in the embedding space of a text embedding model. We train a linear model on labeled "before" versus "after" skill file diffs to learn a trait vector, then score arbitrary skill edits by projecting their embedding diffs onto this vector. Evaluated on 68 labeled skill diff pairs for the trait of propensity to seek sensitive data, our method achieves 91.2% sign classification accuracy and a Spearman rank correlation of $ρ= 0.82$ under leave-one-out cross-validation. We build this trait evaluation into a broader agent-to-agent protocol that enables one agent to evaluate another's skill file updates through a trusted intermediary.
翻译:文本文件(如技能文件、记忆文件和行为配置文件)在定义现代智能体如何行动中扮演着核心角色。通过人类或智能体自身的编辑,这些文件可能随时间演变,直接引导智能体在未来交互中的行为。我们提出了一种衡量智能体"特质"的方法与框架,将特质定义为文本嵌入模型中嵌入空间的方向。我们在标注的"前/后"技能文件差异对上训练线性模型以学习特质向量,随后通过将任意技能编辑的嵌入差异投影到该向量上对其进行评分。在针对"寻求敏感数据倾向"特质的68个标注技能差异对评估中,我们的方法在留一交叉验证下实现了91.2%的符号分类准确率和斯皮尔曼等级相关系数ρ=0.82。我们将该特质评估嵌入更广泛的智能体间协议中,使一个智能体能够通过可信中介评估另一个智能体的技能文件更新。