In virtual reality (VR) educational scenarios, Pedagogical agents (PAs) enhance immersive learning through realistic appearances and interactive behaviors. However, most existing PAs rely on static speech and simple gestures. This limitation reduces their ability to dynamically adapt to the semantic context of instructional content. As a result, interactions often lack naturalness and effectiveness in the teaching process. To address this challenge, this study proposes a large language model (LLM)-driven multimodal expression generation method that constructs semantically sensitive prompts to generate coordinated speech and gesture instructions, enabling dynamic alignment between instructional semantics and multimodal expressive behaviors. A VR-based PA prototype was developed and evaluated through user experience-oriented subjective experiments. Results indicate that dynamically generated multimodal expressions significantly enhance learners' perceived learning effectiveness, engagement, and intention to use, while effectively alleviating feelings of fatigue and boredom during the learning process. Furthermore, the combined dynamic expression of speech and gestures notably enhances learners' perceptions of human-likeness and social presence. The findings provide new insights and design guidelines for building more immersive and naturally expressive intelligent PAs.
翻译:在虚拟现实(VR)教育场景中,教学代理(PAs)通过逼真的外观与交互行为增强了沉浸式学习体验。然而,现有教学代理大多依赖静态语音与简单手势,这一局限削弱了其根据教学内容语义情境进行动态适应的能力,导致教学过程中的交互往往缺乏自然性与有效性。为应对这一挑战,本研究提出一种基于大语言模型(LLM)驱动的多模态表达生成方法,通过构建语义敏感的提示来生成协调的语音与手势指令,实现教学语义与多模态表达行为的动态对齐。研究开发了基于VR的教学代理原型,并通过以用户体验为导向的主观实验进行评估。结果表明,动态生成的多模态表达显著提升了学习者感知的学习效果、参与度与使用意愿,同时有效缓解了学习过程中的疲劳感与厌倦情绪。此外,语音与手势相结合的动态表达显著增强了学习者对代理拟人化程度与社会临场感的感知。本研究为构建更具沉浸感与自然表现力的智能教学代理提供了新的见解与设计指导。