We present an approach for enhancing non-playable characters (NPCs) in games by combining large language models (LLMs) with computer vision to provide contextual awareness of their surroundings. Conventional NPCs typically rely on pre-scripted dialogue and lack spatial understanding, which limits their responsiveness to player actions and reduces overall immersion. Our method addresses these limitations by capturing panoramic images of an NPC's environment and applying semantic segmentation to identify objects and their spatial positions. The extracted information is used to generate a structured JSON representation of the environment, combining object locations derived from segmentation with additional scene graph data within the NPC's bounding sphere, encoded as directional vectors. This representation is provided as input to the LLM, enabling NPCs to incorporate spatial knowledge into player interactions. As a result, NPCs can dynamically reference nearby objects, landmarks, and environmental features, leading to more believable and engaging gameplay. We describe the technical implementation of the system and evaluate it in two stages. First, an expert interview was conducted to gather feedback and identify areas for improvement. After integrating these refinements, a user study was performed, showing that participants preferred the context-aware NPCs over a non-context-aware baseline, confirming the effectiveness of the proposed approach.
翻译:本文提出一种结合大语言模型与计算机视觉的方法,用于增强游戏中非玩家角色(NPC)的环境上下文感知能力。传统NPC通常依赖预设对话脚本且缺乏空间理解能力,这限制了其对玩家行为的响应效果并降低了游戏沉浸感。我们的方法通过捕获NPC所处环境的全景图像,运用语义分割技术识别物体及其空间位置。基于提取的环境信息,生成结构化JSON表征:将分割得到的物体位置信息与NPC包围球内场景图数据进行编码,形成方向向量表达。该环境表征作为大语言模型的输入,使NPC能够将空间知识融入与玩家的交互过程。实验表明,NPC可动态引用附近物体、地标及环境特征,显著提升游戏交互的真实性与趣味性。我们详细阐述了系统技术实现方案,并分两阶段进行验证:首先通过专家访谈收集反馈意见并优化系统,随后开展用户研究。结果显示,参与者对具备上下文感知能力的NPC的偏好显著优于无上下文感知能力的基准方案,证实了所提方法的有效性。