This study evaluates the baseline capabilities of Large Language Models (LLMs) like ChatGPT, Claude, and Gemini to learn concepts in music theory through in-context learning and chain-of-thought prompting. Using carefully designed prompts (in-context learning) and step-by-step worked examples (chain-of-thought prompting), we explore how LLMs can be taught increasingly complex material and how pedagogical strategies for human learners translate to educating machines. Performance is evaluated using questions from an official Canadian Royal Conservatory of Music (RCM) Level 6 examination, which covers a comprehensive range of topics, including interval and chord identification, key detection, cadence classification, and metrical analysis. Additionally, we evaluate the suitability of various music encoding formats for these tasks (ABC, Humdrum, MEI, MusicXML). All experiments were run both with and without contextual prompts. Results indicate that without context, ChatGPT with MEI performs the best at 52%, while with context, Claude with MEI performs the best at 75%. Future work will further refine prompts and expand to cover more advanced music theory concepts. This research contributes to the broader understanding of teaching LLMs and has applications for educators, students, and developers of AI music tools alike.
翻译:本研究评估了ChatGPT、Claude和Gemini等大型语言模型通过上下文学习和思维链提示掌握乐理概念的基础能力。通过精心设计的提示(上下文学习)和分步示例解析(思维链提示),我们探索了如何向大型语言模型传授日益复杂的知识内容,以及适用于人类学习者的教学策略如何转化为机器教育。性能评估采用加拿大皇家音乐学院(RCM)六级官方考试题目,涵盖音程与和弦识别、调性判定、终止式分类及节拍分析等综合主题。此外,我们评估了不同音乐编码格式(ABC、Humdrum、MEI、MusicXML)对这些任务的适用性。所有实验均在提供上下文提示与不提供提示两种条件下进行。结果表明:在无上下文条件下,采用MEI编码的ChatGPT以52%准确率表现最佳;而在有上下文条件下,采用MEI编码的Claude以75%准确率表现最优。未来工作将进一步完善提示策略,并扩展至更高级的乐理概念研究。本研究成果有助于深化对大型语言模型教学机制的理解,对教育工作者、学生及人工智能音乐工具开发者均具有应用价值。