This thesis develops a Transformer model based on Whisper, which extracts melodies and chords from music audio and records them into ABC notation. A comprehensive data processing workflow is customized for ABC notation, including data cleansing, formatting, and conversion, and a mutation mechanism is implemented to increase the diversity and quality of training data. This thesis innovatively introduces the "Orpheus' Score", a custom notation system that converts music information into tokens, designs a custom vocabulary library, and trains a corresponding custom tokenizer. Experiments show that compared to traditional algorithms, the model has significantly improved accuracy and performance. While providing a convenient audio-to-score tool for music enthusiasts, this work also provides new ideas and tools for research in music information processing.
翻译:本研究开发了一种基于Whisper的Transformer模型,该模型可从音乐音频中提取旋律与和弦,并将其记录为ABC记谱法。针对ABC记谱法定制了完整的数据处理流程,包括数据清洗、格式转换与规范化处理,并引入变异机制以提升训练数据的多样性与质量。本文创新性地提出了"俄耳甫斯乐谱"——一种将音乐信息转化为符号的自定义记谱体系,设计了专用词汇库,并训练了相应的自定义分词器。实验表明,相较于传统算法,该模型在准确性与性能上均有显著提升。本研究不仅为音乐爱好者提供了便捷的音频转乐谱工具,也为音乐信息处理领域的研究提供了新思路与技术手段。