While Large Language Model (LLM) agents are often approached from the angle of action planning/generation to accomplish a goal (e.g., given by language descriptions), their abilities to collaborate with each other to achieve a joint goal are not well explored. To address this limitation, this paper studies LLM agents in task collaboration, particularly under the condition of information asymmetry, where agents have disparities in their knowledge and skills and need to work together to complete a shared task. We extend Einstein Puzzles, a classical symbolic puzzle, to a table-top game. In this game, two LLM agents must reason, communicate, and act to satisfy spatial and relational constraints required to solve the puzzle. We apply a fine-tuning-plus-verifier framework in which LLM agents are equipped with various communication strategies and verification signals from the environment. Empirical results highlight the critical importance of aligned communication, especially when agents possess both information-seeking and -providing capabilities. Interestingly, agents without communication can still achieve high task performance; however, further analysis reveals a lack of true rule understanding and lower trust from human evaluators. Instead, by integrating an environment-based verifier, we enhance agents' ability to comprehend task rules and complete tasks, promoting both safer and more interpretable collaboration in AI systems. https://github.com/Roihn/EinsteinPuzzles
翻译:尽管大型语言模型(LLM)智能体常从行动规划/生成的角度出发以实现特定目标(例如通过语言描述给定),但其相互协作以实现共同目标的能力尚未得到充分探索。为弥补这一局限,本文研究了LLM智能体在任务协作中的表现,尤其关注信息不对称条件——即智能体在知识与技能上存在差异,需协同完成共享任务的情境。我们将经典符号推理谜题“爱因斯坦谜题”扩展为桌面游戏,在该游戏中,两个LLM智能体必须通过推理、通信与行动来满足解谜所需的空间与关系约束。我们采用“微调-验证器”框架,为LLM智能体配备多种通信策略及来自环境的验证信号。实证结果突显了对齐通信的关键作用,尤其在智能体同时具备信息获取与提供能力时更为显著。有趣的是,未配备通信机制的智能体仍能实现较高的任务完成率;然而进一步分析表明,其缺乏对规则的真正理解,且人类评估者对其信任度较低。通过引入基于环境的验证器,我们增强了智能体理解任务规则与完成任务的能力,从而促进了人工智能系统更安全、更具可解释性的协作。项目代码发布于:https://github.com/Roihn/EinsteinPuzzles