Large Language Model agents have rapidly evolved from static text generators into dynamic systems capable of executing complex autonomous workflows. To enhance reliability, multi-agent frameworks assigning specialized roles are increasingly adopted to enable self-reflection and mutual auditing. While such role-playing effectively leverages domain expert knowledge, we find it simultaneously induces a human-like cognitive bias known as Actor-Observer Asymmetry (AOA). Specifically, an agent acting as an actor (during self-reflection) tends to attribute failures to external factors, whereas an observer (during mutual auditing) attributes the same errors to internal faults. We quantify this using our new Ambiguous Failure Benchmark, which reveals that simply swapping perspectives triggers the AOA effect in over 20% of cases for most models. To tame this bias, we introduce ReTAS (Reasoning via Thesis-Antithesis-Synthesis), a model trained through dialectical alignment to enforce perspective-invariant reasoning. By integrating dialectical chain-of-thought with Group Relative Policy Optimization, ReTAS guides agents to synthesize conflicting viewpoints into an objective consensus. Experiments demonstrate that ReTAS effectively mitigates attribution inconsistency and significantly improves fault resolution rates in ambiguous scenarios.
翻译:大型语言模型智能体已从静态文本生成器迅速演变为能够执行复杂自主工作流的动态系统。为提升可靠性,多智能体框架通过分配专业化角色实现自我反思与相互审核。虽然这种角色扮演有效利用了领域专家知识,但我们发现其同时诱发了一种类似人类的认知偏差——行动者-观察者不对称性(AOA)。具体而言,扮演行动者(自我反思时)的智能体倾向于将失败归因于外部因素,而观察者(相互审核时)则将相同错误归因于内部缺陷。我们通过新提出的模糊失败基准测试量化了这一现象,结果表明大多数模型在简单切换视角时,AOA效应发生率超过20%。为驯服此偏差,我们提出ReTAS(通过正题-反题-合题推理),该模型通过辩证对齐训练实现视角不变的推理。通过将辩证思维链与组相对策略优化相结合,ReTAS引导智能体将冲突观点综合为客观共识。实验表明,ReTAS有效缓解了归因不一致性,并在模糊场景下显著提升了故障解决率。