Multi-agent large language model (LLM) systems increasingly consist of agents that observe and respond to one another's outputs. While value alignment is typically evaluated for isolated models, how value perturbations propagate through agent interactions remains poorly understood. We present ValueFlow, a perturbation-based framework that measures value drift in multi-agent systems via a 56-value valuation dataset derived from the Schwartz Value Survey, with agent value orientations scored using an LLM-as-a-judge protocol. ValueFlow decomposes value drift into agent-level response behavior and system-level structural effects, captured by two metrics: \b{eta}-susceptibility, an agent's sensitivity to perturbed peer value signals, and system susceptibility (SS), the effect of node-level perturbations on final system outputs.Experiments span across value dimensions, backbones, personas, and topologies, showing that susceptibility varies sharply across values and is strongly shaped by interaction structure, indicating that value alignment in multi-agent systems is a system-level property, not just an agent-level one. ValueFlow thus provides a principled basis for auditing and mitigating value propagation in deployed multi-agent systems.
翻译:[摘要] 多智能体大语言模型(LLM)系统日益由相互观察和响应彼此输出的智能体组成。虽然价值对齐通常针对孤立模型进行评估,但价值扰动如何通过智能体交互进行传播仍鲜为人知。我们提出ValueFlow,一个基于扰动的框架,通过从施瓦茨价值观调查中衍生出的56维价值评估数据集来测量多智能体系统中的价值漂移,其中智能体价值取向采用LLM-as-a-judge协议进行评分。ValueFlow将价值漂移分解为智能体级别的响应行为和系统级别的结构效应,通过两个指标来刻画:\b{eta}-敏感性(智能体对扰动同伴价值信号的敏感度)和系统敏感性(SS)(节点级扰动对最终系统输出的影响)。实验覆盖了价值维度、基础模型、角色设定和拓扑结构,结果表明敏感性在不同价值维度上存在显著差异,并受到交互结构的强烈影响,这表明多智能体系统中的价值对齐是一个系统级属性,而不仅仅是智能体级属性。因此,ValueFlow为审计和缓解已部署多智能体系统中的价值传播提供了原则性基础。