Talking Slide Avatars: Open-Source Multimodal Communication Approach for Teaching

Slide-based teaching is widely used in higher education, yet in online, hybrid, and asynchronous contexts, slides often lose instructor presence, narrative continuity, and expressive framing that help learners connect with course content. Full lecture video can partly restore these qualities, but it is time-consuming to record, revise, and reuse. This study presents a practice-based implementation and analytic reflection of an open-source workflow for creating talking slide avatars. The workflow integrates OpenVoice for text-to-speech and authorized voice-style conversion with Ditto-TalkingHead for audio-driven talking-image synthesis, enabling instructors to transform a short script and an authorized or synthetic portrait image into a narrated video for slide decks or HTML-based lecture materials. Rather than treating this workflow only as a technical solution, the study frames talking slide avatars as multimodal communication artifacts at the intersection of digital pedagogy, aesthetic education, and art-technology practice. The paper documents the production pipeline, analyzes communicative and aesthetic affordances, and proposes practical guidelines for script length, image selection, pacing, disclosure, accessibility, consent, and ethical use. Its contribution is not a validated learning intervention, but an educator-oriented open-source production model and communication-design framework. The study concludes that short, transparent, and carefully designed avatars may provide a reusable communication layer for introductions, transitions, reminders, and recaps when used selectively and with appropriate ethical safeguards.

翻译：基于幻灯片的讲授在高等教育中广泛使用，然而在线、混合式及异步教学情境下，幻灯片常因失去教师临场感、叙事连贯性和表达框架而削弱学习者与课程内容的联结。完整授课视频虽能部分恢复这些特质，但录制、修改和重复使用却耗时费力。本研究提出一种基于实践的开源工作流及其分析性反思，用于创建会说话的幻灯片化身。该工作流整合了OpenVoice（用于文本转语音及授权语音风格转换）与Ditto-TalkingHead（用于音频驱动的说话图像合成），使教师能够将简短脚本和授权或合成的肖像图像转化为可附于幻灯片组或基于HTML的授课材料中的叙事视频。本研究并未将这一工作流仅视为技术解决方案，而是将"会说话的幻灯片化身"定位为数字教学法、审美教育与艺术—技术实践交叉领域的多模态传播制品。论文记录了这套生产流程，分析了其传播性与审美性可供性，并针对脚本长度、图像选择、节奏把控、透明度、可访问性、知情同意及伦理使用提出了实践指南。其贡献并非经过验证的学习干预措施，而是一套面向教育工作者的开源生产模型与传播设计框架。研究结论指出：在选择性使用并辅以适当伦理保障的前提下，短小精悍、透明审慎设计的化身可为课程导入、过渡、提醒及总结环节提供可重复利用的传播层。