Science demonstrations are important for effective STEM education, yet teachers face challenges in conducting them safely and consistently across multiple occasions, where robotics can be helpful. However, current Vision-Language-Action (VLA) models require substantial computational resources and sacrifice language generation capabilities to maximize efficiency, making them unsuitable for resource-constrained educational settings that require interpretable, explanation-generating systems. We present \textit{Pedagogical VLA Framework}, a framework that applies pedagogical alignment to lightweight VLA models through four components: text healing to restore language generation capabilities, large language model (LLM) distillation to transfer pedagogical knowledge, safety training for educational environments, and pedagogical evaluation adjusted to science education contexts. We evaluate Pedagogical VLA Framework across five science demonstrations spanning physics, chemistry, biology, and earth science, using an evaluation framework developed in collaboration with science education experts. Our evaluation assesses both task performance (success rate, protocol compliance, efficiency, safety) and pedagogical quality through teacher surveys and LLM-as-Judge assessment. We additionally provide qualitative analysis of generated texts. Experimental results demonstrate that Pedagogical VLA Framework achieves comparable task performance to baseline models while producing contextually appropriate educational explanations.
翻译:科学演示对于有效的STEM教育至关重要,然而教师在多次实施中面临安全性与一致性的挑战,而机器人技术可为此提供助力。但当前的视觉-语言-动作模型需要大量计算资源,且为追求效率最大化而牺牲语言生成能力,使其难以适用于需要可解释、能生成教学说明的资源受限教育场景。我们提出《教学VLA框架》,该框架通过四个组件将教学对齐应用于轻量化VLA模型:恢复语言生成能力的文本修复、传递教学知识的大语言模型蒸馏、适应教育环境的安全训练,以及针对科学教育情境调整的教学评估。我们在涵盖物理、化学、生物与地球科学的五项科学演示中评估该框架,并采用与科学教育专家合作开发的评估体系。我们的评估通过教师问卷调查和LLM-as-Judge评估方法,同时考察任务性能(成功率、协议符合度、效率、安全性)与教学品质。此外,我们还对生成文本进行了定性分析。实验结果表明,教学VLA框架在保持与基线模型相当任务性能的同时,能生成符合情境的教育性解释说明。