From Slides to Chatbots: Enhancing Large Language Models with University Course Materials

Large Language Models (LLMs) have advanced rapidly in recent years. One application of LLMs is to support student learning in educational settings. However, prior work has shown that LLMs still struggle to answer questions accurately within university-level computer science courses. In this work, we investigate how incorporating university course materials can enhance LLM performance in this setting. A key challenge lies in leveraging diverse course materials such as lecture slides and transcripts, which differ substantially from typical textual corpora: slides also contain visual elements like images and formulas, while transcripts contain spoken, less structured language. We compare two strategies, Retrieval-Augmented Generation (RAG) and Continual Pre-Training (CPT), to extend LLMs with course-specific knowledge. For lecture slides, we further explore a multi-modal RAG approach, where we present the retrieved content to the generator in image form. Our experiments reveal that, given the relatively small size of university course materials, RAG is more effective and efficient than CPT. Moreover, incorporating slides as images in the multi-modal setting significantly improves performance over text-only retrieval. These findings highlight practical strategies for developing AI assistants that better support learning and teaching, and we hope they inspire similar efforts in other educational contexts.

翻译：近年来，大型语言模型（LLMs）取得了飞速进展。LLMs的应用之一是在教育场景中支持学生学习。然而，先前的研究表明，LLMs在回答大学计算机科学课程相关问题时仍存在准确性不足的问题。本研究探讨了如何通过融入大学课程材料来提升LLMs在此类场景中的表现。一个关键挑战在于如何有效利用多样化的课程材料（如讲义幻灯片和课堂转录稿），这些材料与典型的文本语料存在显著差异：幻灯片包含图像和公式等视觉元素，而转录稿则包含口语化、结构松散的语言。我们比较了检索增强生成（RAG）和持续预训练（CPT）两种策略，以扩展LLMs的课程特定知识。针对讲义幻灯片，我们进一步探索了多模态RAG方法，将检索到的内容以图像形式呈现给生成器。实验结果表明，考虑到大学课程材料的规模相对较小，RAG比CPT更具效果和效率。此外，在多模态设置中将幻灯片作为图像融入，相比纯文本检索能显著提升性能。这些发现为开发能更好支持学与教的人工智能助手提供了实用策略，我们期望其能激励其他教育场景中的类似探索。

相关内容

课程

关注 6

课程是指学校学生所应学习的学科总和及其进程与安排。课程是对教育的目标、教学内容、教学活动方式的规划和设计，是教学计划、教学大纲等诸多方面实施过程的总和。广义的课程是指学校为实现培养目标而选择的教育内容及其进程的总和，它包括学校老师所教授的各门学科和有目的、有计划的教育活动。狭义的课程是指某一门学科。专知上对国内外最新AI+X的课程进行了收集与索引，涵盖斯坦福大学、CMU、MIT、清华、北大等名校开放课程。

不可错过！《大语言模型》课程

专知会员服务

28+阅读 · 2025年4月15日

如何将领域知识注入大模型？最新《将领域特定知识注入大语言模型》综述

专知会员服务

79+阅读 · 2025年2月24日

大型语言模型（LLMs），附Slides与视频

专知会员服务

71+阅读 · 2024年6月30日

大模型如何用于科学发现？浙大等最新《科学大型语言模型：生物学与化学领域》综述

专知会员服务

50+阅读 · 2024年1月29日