Multimodal Large Language Models (MLLMs) are beginning to empower new user experiences that can flexibly generate content from a range of inputs, including images, text, speech, and video. These capabilities have the potential to enrich learning by enabling users to capture and interact with information using a variety of modalities, but little is known about how educators envision how MLLMs might shape the future of learning experiences, what challenges diverse teachers encounter when interpreting how these models work, and what practical needs should be considered for successful implementation in educational contexts. We investigated educator perspectives through formative workshops with 12 K-12 educators, where participants brainstormed learning opportunities, discussed practical concerns for effective use, and prototyped their own MLLM-powered learning applications using Claude 3.5 and its Artifacts feature for previewing code-based output. We use case studies to illustrate two contrasting end-user approaches (teacher-and student-driven), and share insights about opportunities and concerns expressed by our participants, ending with implications for leveraging MLLMs for future learning experiences.
翻译:多模态大语言模型(MLLMs)正开始赋能新的用户体验,能够灵活地基于图像、文本、语音和视频等多种输入生成内容。这些能力通过允许用户使用多种模态捕捉和交互信息,有望丰富学习过程。然而,关于教育工作者如何设想MLLMs可能塑造未来学习体验、不同教师在解释这些模型工作原理时遇到的挑战,以及在教育场景中成功实施应考虑哪些实际需求,目前仍知之甚少。我们通过与12位K-12教育工作者进行形成性工作坊,调查了教育工作者的观点。参与者在此过程中集思广益探讨学习机会,讨论有效使用的实际关切,并利用Claude 3.5及其预览代码输出的Artifacts功能,原型设计了自己的基于MLLM的学习应用。我们通过案例研究阐释了两种截然不同的终端用户方法(教师驱动与学生驱动),并分享了参与者所表达的机会与关切的见解,最后探讨了利用MLLMs提升未来学习体验的意义。