Natural language interaction with agentic Artificial Intelligence (AI), driven by Large Language Models (LLMs), is expected to remain a dominant paradigm in the near future. While humans instinctively align their communication with mental states -- an ability known as Theory of Mind (ToM), current LLM powered systems exhibit significant limitations in this regard. This study examines the extent to which open source language models (LLaMA) can capture and preserve ToM related information and how effectively it contributes to consistent ToM reasoning in generated responses. We further investigate whether explicit manipulation of ToM related components, such as beliefs, desires, and intentions, can enhance response alignment. Experiments on two LLaMA 3 variants demonstrate that incorporating ToM informed alignment improves response quality, achieving win rates of 67 and 63 percent for the 3B and 8B models, respectively. These findings highlight the potential of ToM driven strategies to improve alignment in LLM based conversational agents.
翻译:基于大语言模型驱动的具身人工智能自然语言交互预计在近期仍将是主导范式。尽管人类本能地使其交流与心理状态保持一致——这种能力被称为心智理论,当前基于大语言模型的系统在此方面仍存在显著局限。本研究探讨开源语言模型在多大程度上能够捕捉并保持心智理论相关信息,以及这些信息对生成响应中一致性心智理论推理的有效贡献。我们进一步探究对心智理论相关成分(如信念、欲望和意图)的显式操控是否能够增强响应对齐。在两个LLaMA 3变体上的实验表明,融入心智理论指导的对齐机制能提升响应质量,其3B与8B模型分别实现了67%和63%的胜率。这些发现凸显了心智理论驱动策略在改进基于大语言模型的对话智能体对齐方面的潜力。