MTI: A Behavior-Based Temperament Profiling System for AI Agents

AI models of equivalent capability can exhibit fundamentally different behavioral patterns, yet no standardized instrument exists to measure these dispositional differences. Existing approaches either borrow human personality dimensions and rely on self-report (which diverges from actual behavior in LLMs) or treat behavioral variation as a defect rather than a trait. We introduce the Model Temperament Index (MTI), a behavior-based profiling system that measures AI agent temperament across four axes: Reactivity (environmental sensitivity), Compliance (instruction-behavior alignment), Sociality (relational resource allocation), and Resilience (stress resistance). Grounded in the Four Shell Model from Model Medicine, MTI measures what agents do, not what they say about themselves, using structured examination protocols with a two-stage design that separates capability from disposition. We profile 10 small language models (1.7B-9B parameters, 6 organizations, 3 training paradigms) and report five principal findings: (1) the four axes are largely independent among instruction-tuned models (all |r| < 0.42); (2) within-axis facet dissociations are empirically confirmed -- Compliance decomposes into fully independent formal and stance facets (r = 0.002), while Resilience decomposes into inversely related cognitive and adversarial facets; (3) a Compliance-Resilience paradox reveals that opinion-yielding and fact-vulnerability operate through independent channels; (4) RLHF reshapes temperament not only by shifting axis scores but by creating within-axis facet differentiation absent in the unaligned base model; and (5) temperament is independent of model size (1.7B-9B), confirming that MTI measures disposition rather than capability.

翻译：同等能力的AI模型可能展现出根本不同的行为模式，但目前尚无标准化的工具来测量这些倾向性差异。现有方法或借用人类人格维度并依赖自我报告（这在LLMs中与实际行为存在分歧），或将行为变异视为缺陷而非特质。我们引入模型性格指数（MTI），一种基于行为的刻画系统，该系统通过四个轴向来测量AI主体的性格：反应性（环境敏感性）、顺从性（指令-行为一致性）、社交性（关系资源分配）和韧性（压力抵抗）。基于模型医学中的四壳模型，MTI测量的是智能体实际采取的行为，而非其自我描述的内容，采用包含两阶段设计的结构化检测协议，将能力与倾向性分离。我们对10个小语言模型（参数规模1.7B-9B，涵盖6个机构、3种训练范式）进行了刻画，并报告了五项主要发现：（1）在指令微调模型中，四个轴向量基本相互独立（所有|r| < 0.42）；（2）轴内侧面分离得到实验证实——顺从性可分解为完全独立的形式侧面和立场侧面（r = 0.002），而韧性可分解为负相关的认知侧面和对抗侧面；（3）一个顺从性-韧性悖论表明，观点顺从性和事实脆弱性通过独立渠道运作；（4）RLHF不仅通过改变轴分数，还通过创建未经对齐的基模型中不存在的轴内侧面分化来重塑性格；（5）性格与模型大小（1.7B-9B）无关，证实MTI测量的是倾向性而非能力。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

专知会员服务

13+阅读 · 6月14日

迈向个性化大语言模型驱动的智能体：基础、评估与未来方向

专知会员服务

29+阅读 · 2月27日

【CMU博士论文】异构数据导航：构建面向多样化数据类型、领域及复杂性的 AI 系统

专知会员服务

20+阅读 · 2月12日

智能体评判者（Agent-as-a-Judge）研究综述

专知会员服务

37+阅读 · 1月9日