The rapid evolution of large language models (LLMs) has transformed human-computer interaction (HCI), but the interaction with LLMs is currently mainly focused on text-based interactions, while other multi-model approaches remain under-explored. This paper introduces VTutor, an open-source Software Development Kit (SDK) that combines generative AI with advanced animation technologies to create engaging, adaptable, and realistic APAs for human-AI multi-media interactions. VTutor leverages LLMs for real-time personalized feedback, advanced lip synchronization for natural speech alignment, and WebGL rendering for seamless web integration. Supporting various 2D and 3D character models, VTutor enables researchers and developers to design emotionally resonant, contextually adaptive learning agents. This toolkit enhances learner engagement, feedback receptivity, and human-AI interaction while promoting trustworthy AI principles in education. VTutor sets a new standard for next-generation APAs, offering an accessible, scalable solution for fostering meaningful and immersive human-AI interaction experiences. The VTutor project is open-sourced and welcomes community-driven contributions and showcases.
翻译:大型语言模型(LLM)的快速发展正在重塑人机交互(HCI)范式,然而当前与LLM的交互主要集中于文本形式,其他多模态交互方式仍待深入探索。本文介绍VTutor,一个开源软件开发工具包(SDK),它将生成式人工智能与先进动画技术相结合,用于创建适用于人机多媒体交互的、具有吸引力、适应性强且逼真的动画教学代理(APA)。VTutor利用LLM实现实时个性化反馈,通过先进的唇形同步技术实现自然语音对齐,并借助WebGL渲染实现无缝的Web集成。该工具包支持多种2D与3D角色模型,使研究者与开发者能够设计出情感共鸣强、情境适应性高的学习代理。VTutor在提升学习者参与度、反馈接受度以及人机交互体验的同时,致力于在教育领域推广可信人工智能原则。本工具包为新一代APA设立了新标准,提供了一个易于使用、可扩展的解决方案,以促进有意义且沉浸式的人机交互体验。VTutor项目已开源,并欢迎社区驱动的贡献与成果展示。