Large language models (LLMs) can empower teachers to build pedagogical conversational agents (PCAs) customized for their students. As students have different prior knowledge and motivation levels, teachers must review the adaptivity of their PCAs to diverse students. Existing chatbot reviewing methods (e.g., direct chat and benchmarks) are either manually intensive for multiple iterations or limited to testing only single-turn interactions. We present TeachTune, where teachers can create simulated students and review PCAs by observing automated chats between PCAs and simulated students. Our technical pipeline instructs an LLM-based student to simulate prescribed knowledge levels and traits, helping teachers explore diverse conversation patterns. Our pipeline could produce simulated students whose behaviors correlate highly to their input knowledge and motivation levels within 5% and 10% accuracy gaps. Thirty science teachers designed PCAs in a between-subjects study, and using TeachTune resulted in a lower task load and higher student profile coverage over a baseline.
翻译:大型语言模型(LLM)能够赋能教师构建针对学生个性化定制的教学对话代理(PCA)。由于学生具有不同的先验知识水平和学习动机,教师必须评估其PCA对不同类型学生的适应性。现有的聊天机器人评估方法(例如直接对话和基准测试)要么在多次迭代中需要大量人工投入,要么仅限于测试单轮交互。我们提出TeachTune系统,教师可通过创建模拟学生并观察PCA与模拟学生之间的自动化对话来评估PCA。我们的技术流程通过指令控制基于LLM的学生模拟预设的知识水平与特征,帮助教师探索多样化的对话模式。该流程生成的模拟学生行为与其输入知识水平和动机的相关性误差分别在5%和10%以内。一项组间研究中,三十位科学教师设计了PCA,使用TeachTune相较于基线方法显著降低了任务负荷并提升了学生画像覆盖度。