Large Language Models (LLMs) have advanced rapidly in recent years. One application of LLMs is to support student learning in educational settings. However, prior work has shown that LLMs still struggle to answer questions accurately within university-level computer science courses. In this work, we investigate how incorporating university course materials can enhance LLM performance in this setting. A key challenge lies in leveraging diverse course materials such as lecture slides and transcripts, which differ substantially from typical textual corpora: slides also contain visual elements like images and formulas, while transcripts contain spoken, less structured language. We compare two strategies, Retrieval-Augmented Generation (RAG) and Continual Pre-Training (CPT), to extend LLMs with course-specific knowledge. For lecture slides, we further explore a multi-modal RAG approach, where we present the retrieved content to the generator in image form. Our experiments reveal that, given the relatively small size of university course materials, RAG is more effective and efficient than CPT. Moreover, incorporating slides as images in the multi-modal setting significantly improves performance over text-only retrieval. These findings highlight practical strategies for developing AI assistants that better support learning and teaching, and we hope they inspire similar efforts in other educational contexts.
翻译:近年来,大型语言模型(LLMs)取得了飞速进展。LLMs的应用之一是在教育场景中支持学生学习。然而,先前的研究表明,LLMs在回答大学计算机科学课程相关问题时仍存在准确性不足的问题。本研究探讨了如何通过融入大学课程材料来提升LLMs在此类场景中的表现。一个关键挑战在于如何有效利用多样化的课程材料(如讲义幻灯片和课堂转录稿),这些材料与典型的文本语料存在显著差异:幻灯片包含图像和公式等视觉元素,而转录稿则包含口语化、结构松散的语言。我们比较了检索增强生成(RAG)和持续预训练(CPT)两种策略,以扩展LLMs的课程特定知识。针对讲义幻灯片,我们进一步探索了多模态RAG方法,将检索到的内容以图像形式呈现给生成器。实验结果表明,考虑到大学课程材料的规模相对较小,RAG比CPT更具效果和效率。此外,在多模态设置中将幻灯片作为图像融入,相比纯文本检索能显著提升性能。这些发现为开发能更好支持学与教的人工智能助手提供了实用策略,我们期望其能激励其他教育场景中的类似探索。