Persian music, with its unique tonalities, modal systems (Dastgah), and rhythmic structures, presents significant challenges for music generation models trained primarily on Western music. We address this gap by curating the first large-scale dataset of Persian songs, comprising over 900 hours high-quality audio samples across diverse sub-genres, including pop, traditional, and contemporary styles. This dataset captures the rich melodic and cultural diversity of Persian music and serves as the foundation for fine-tuning MusicGen, a state-of-the-art generative music model. We adapt MusicGen to this domain and evaluate its performance by utilizing subjective and objective metrics. To assess the semantic alignment between generated music and intended style tags, we report the proportion of relevant tags accurately reflected in the generated outputs. Our results demonstrate that the fine-tuned model produces compositions that more align with Persian stylistic conventions. This work introduces a new resource for generative music research and illustrates the adaptability of music generation models to underrepresented cultural and linguistic contexts.
翻译:波斯音乐以其独特的音调体系、调式系统(达斯特加赫)和节奏结构,给主要基于西方音乐训练的音乐生成模型带来了显著挑战。为填补这一空白,我们构建了首个大规模波斯歌曲数据集,涵盖包括流行、传统及当代风格在内的多样化子流派,包含超过900小时的高质量音频样本。该数据集捕捉了波斯音乐丰富的旋律与文化多样性,并为微调当前最先进的生成式音乐模型MusicGen奠定了基础。我们通过主观与客观指标对MusicGen在该领域的适应性进行评估,并验证其性能。为评估生成音乐与预期风格标签之间的语义对齐程度,我们计算了生成输出中准确反映相关标签的比例。结果表明,微调后的模型能生成更符合波斯风格常规的作品。本研究为生成式音乐研究提供了新资源,并展示了音乐生成模型在代表性不足的文化与语言语境中的适应能力。