The rapid evolution of large language models (LLMs) has transformed human-computer interaction (HCI), but the interaction with LLMs is currently mainly focused on text-based interactions, while other multi-model approaches remain under-explored. This paper introduces VTutor, an open-source Software Development Kit (SDK) that combines generative AI with advanced animation technologies to create engaging, adaptable, and realistic APAs for human-AI multi-media interactions. VTutor leverages LLMs for real-time personalized feedback, advanced lip synchronization for natural speech alignment, and WebGL rendering for seamless web integration. Supporting various 2D and 3D character models, VTutor enables researchers and developers to design emotionally resonant, contextually adaptive learning agents. This toolkit enhances learner engagement, feedback receptivity, and human-AI interaction while promoting trustworthy AI principles in education. VTutor sets a new standard for next-generation APAs, offering an accessible, scalable solution for fostering meaningful and immersive human-AI interaction experiences. The VTutor project is open-sourced and welcomes community-driven contributions and showcases.
翻译:大型语言模型(LLM)的快速发展已深刻改变了人机交互(HCI)模式,然而当前与LLM的交互主要集中于文本形式,其他多模态交互方式仍未被充分探索。本文介绍VTutor——一款开源软件开发工具包(SDK),它通过融合生成式人工智能与先进动画技术,为人类与AI的多媒体交互创建具有吸引力、适应性强且逼真的动画教学代理(APA)。VTutor利用LLM实现实时个性化反馈,采用先进的唇形同步技术实现自然语音对齐,并借助WebGL渲染实现无缝的网页集成。该工具包支持多种2D与3D角色模型,使研究者和开发者能够设计具有情感共鸣力、情境自适应性的学习代理。本工具集在提升学习者参与度、反馈接受度及人机交互质量的同时,亦致力于推动可信AI原则在教育领域的实践。VTutor为新一代APA树立了新标准,为促进深度沉浸式人机交互体验提供了易用且可扩展的解决方案。VTutor项目已开源,并欢迎社区驱动的贡献与成果展示。