Chatbots based on large language models (LLMs) are increasingly adopted for information retrieval, text generation, and writing assistance. In educational settings, their use is also rapidly increasing. Students leverage these systems to complete tasks, access information, and support learning. However, the role of LLM-based chatbots in supporting learning and assessment in university-level computer science education is still underexplored. To address this gap, we investigate the performance of several LLM-based chatbots in solving multiple-choice questions (MCQs) at the university level and evaluate their capabilities to assist student learning. We developed 70 MCQs for a university lecture on interactive visual data analysis and evaluated the chatbots' performance using different prompt designs. We further compared the results with students' performance. Finally, we conducted a user study in two lectures (interactive visual data analysis, computer vision) to investigate how chatbot-generated answers and explanations affect students' performance. The chatbot performance showed significant differences between smaller models and GPT-4o and GPT-5 models, which achieved the best results. The results of the user study show that presenting ChatGPT answers together with an explanation does not improve students' performance in general.
翻译:基于大语言模型(LLM)的聊天机器人正越来越多地被用于信息检索、文本生成和写作辅助。在教育领域,其应用也迅速增长。学生利用这些系统完成任务、获取信息并辅助学习。然而,LLM聊天机器人在大学计算机科学教育中支持学习与评估的作用仍待深入探究。为填补这一空白,我们研究了多种LLM聊天机器人在解答大学层次多选题(MCQs)时的表现,并评估其辅助学生学习的潜力。我们为大学课程《交互式可视化数据分析》设计了70道多选题,通过不同提示词设计评估聊天机器人的性能,进一步将其结果与学生表现进行对比分析。最后,我们在两门课程(交互式可视化数据分析、计算机视觉)中开展用户研究,探究聊天机器人生成的答案与解释对学生表现的影响。实验结果显示:小型模型与达到最优效果的GPT-4o及GPT-5模型之间存在显著性能差异。用户研究表明,直接呈现ChatGPT答案及解释通常不会提升学生的整体表现。