While Online Learning is growing and becoming widespread, the associated curricula often suffer from a lack of coverage and outdated content. In this regard, a key question is how to dynamically define the topics that must be covered to thoroughly learn a subject (e.g., a course). Large Language Models (LLMs) are considered candidates that can be used to address curriculum development challenges. Therefore, we developed a framework and a novel dataset, built on YouTube, to evaluate LLMs' performance when it comes to generating learning topics for specific courses. The experiment was conducted across over 100 courses and nearly 7,000 YouTube playlists in various subject areas. Our results indicate that GPT-4 can produce more accurate topics for the given courses than extracted topics from YouTube video playlists in terms of BERTScore
翻译:随着在线学习的普及与发展,相关课程内容往往存在覆盖面不足和内容陈旧的问题。在此背景下,如何动态确定系统学习某一学科(例如一门课程)所需涵盖的主题成为关键问题。大型语言模型(LLMs)被视为应对课程开发挑战的潜在工具。为此,我们构建了一个基于YouTube的评估框架和新型数据集,用以评估LLMs为特定课程生成学习主题的能力。实验涵盖超过100门课程及近7,000个跨学科领域的YouTube播放列表。研究结果表明,在BERTScore指标上,GPT-4为给定课程生成的主题比从YouTube视频播放列表中提取的主题具有更高的准确性。