Recent research shows that Large Language Models (LLMs) exhibit a compelling level of proficiency in Theory of Mind (ToM) tasks. This ability to impute unobservable mental states to others is vital to human social cognition and may prove equally important in principal-agent relations between individual humans and Artificial Intelligences (AIs). In this paper, we explore how a mechanism studied in developmental psychology known as Violation of Expectation (VoE) can be implemented to reduce errors in LLM prediction about users by leveraging emergent ToM affordances. And we introduce a \textit{metacognitive prompting} framework to apply VoE in the context of an AI tutor. By storing and retrieving facts derived in cases where LLM expectation about the user was violated, we find that LLMs are able to learn about users in ways that echo theories of human learning. Finally, we discuss latent hazards and augmentative opportunities associated with modeling user psychology and propose ways to mitigate risk along with possible directions for future inquiry.
翻译:近期研究表明,大语言模型在心理理论任务中展现出令人瞩目的能力水平。这种将不可观察的心理状态归因于他人的能力对人类社交认知至关重要,在人类个体与人工智能体之间的委托-代理关系中同样具有潜在重要性。本文探索如何通过利用大语言模型涌现的心理理论能力,将发展心理学中称为"期望违背"的机制应用于减少模型对用户预测的误差。我们提出了一种元认知提示框架,在人工智能导师场景中应用期望违背机制。通过存储和检索大语言模型对用户期望被违背时的推理事实,我们发现大语言模型能够以呼应人类学习理论的方式习得用户特征。最后,我们讨论了建模用户心理相关的潜在风险与增强机遇,并提出了风险缓解途径及未来可能的研究方向。