Large language models (LLMs) possess extensive world knowledge, yet methods for effectively eliciting this knowledge remain underexplored. Nationality and region prediction tasks require understanding of not only linguistic features but also cultural and historical background, making LLM world knowledge particularly valuable. However, conventional LLM prompting methods rely on direct reasoning approaches, which have limitations in applying abstract linguistic rules. We propose LLM Associative Memory Agents (LAMA), a novel framework that leverages LLM world knowledge as associative memory. Rather than directly inferring nationality from names, LAMA recalls famous individuals with the same name and aggregates their nationalities through indirect reasoning. A dual-agent architecture comprising a Person Agent and a Media Agent, specialized in different knowledge domains, recalls famous individuals in parallel, generating Top-1 predictions through voting and Top-K predictions through conditional completion. On a 99-country nationality prediction task, LAMA achieved 0.817 accuracy, substantially outperforming conventional LLM prompting methods and neural models. Our experiments reveal that LLMs exhibit higher reliability in recalling concrete examples than in abstract reasoning, that recall-based approaches are robust to low-frequency nationalities independent of data frequency distributions, and that the dual-agent architecture functions complementarily to produce synergistic effects. These results demonstrate the effectiveness of a new multi-agent system that retrieves and aggregates LLM knowledge rather than prompting reasoning.
翻译:大语言模型(LLMs)拥有广泛的世界知识,然而有效激发这些知识的方法仍未得到充分探索。国籍与地区预测任务不仅需要理解语言特征,还需要文化历史背景知识,这使得LLM的世界知识显得尤为宝贵。然而,传统的LLM提示方法依赖于直接推理路径,在应用抽象语言规则方面存在局限。我们提出了LLM联想记忆智能体(LAMA),这是一个新颖的框架,它将LLM的世界知识作为联想记忆加以利用。LAMA并非直接从姓名推断国籍,而是通过回忆同名知名人物,并借助间接推理聚合其国籍信息。该框架采用双智能体架构,包含擅长不同知识领域的人物智能体与媒体智能体,它们并行回忆知名人物,通过投票机制生成Top-1预测,并通过条件补全生成Top-K预测。在一个涵盖99个国家的国籍预测任务中,LAMA取得了0.817的准确率,显著优于传统的LLM提示方法与神经网络模型。我们的实验表明:LLM在回忆具体实例方面比进行抽象推理表现出更高的可靠性;基于回忆的方法对低频国籍具有鲁棒性,不受数据频率分布的影响;双智能体架构能发挥互补作用,产生协同效应。这些结果证明了一种新型多智能体系统的有效性,该系统通过检索与聚合LLM知识而非直接提示推理来完成任务。