As an important part of Music Information Retrieval (MIR), Symbolic Music Understanding (SMU) has gained substantial attention, as it can assist musicians and amateurs in learning and creating music. Recently, pre-trained language models have been widely adopted in SMU because the symbolic music shares a huge similarity with natural language, and the pre-trained manner also helps make full use of limited music data. However, the issue of bias, such as sexism, ageism, and racism, has been observed in pre-trained language models, which is attributed to the imbalanced distribution of training data. It also has a significant influence on the performance of downstream tasks, which also happens in SMU. To address this challenge, we propose Adversarial-MidiBERT, a symbolic music understanding model based on Bidirectional Encoder Representations from Transformers (BERT). We introduce an unbiased pre-training method based on adversarial learning to minimize the participation of tokens that lead to biases during training. Furthermore, we propose a mask fine-tuning method to narrow the data gap between pre-training and fine-tuning, which can help the model converge faster and perform better. We evaluate our method on four music understanding tasks, and our approach demonstrates excellent performance in all of them. The code for our model is publicly available at https://github.com/RS2002/Adversarial-MidiBERT.
翻译:作为音乐信息检索(MIR)的重要组成部分,符号音乐理解(SMU)因其能够辅助音乐家与爱好者学习与创作音乐而受到广泛关注。近年来,由于符号音乐与自然语言具有高度相似性,且预训练方式有助于充分利用有限的音乐数据,预训练语言模型在SMU领域得到广泛应用。然而,预训练语言模型中已观察到诸如性别歧视、年龄歧视与种族歧视等偏见问题,这归因于训练数据分布的不均衡。此类偏见对下游任务性能产生显著影响,在SMU中同样存在。为应对这一挑战,我们提出Adversarial-MidiBERT——一种基于Transformer双向编码器表示(BERT)的符号音乐理解模型。我们引入基于对抗学习的无偏预训练方法,以最小化训练过程中导致偏见的标记参与度。此外,我们提出掩码微调方法以缩小预训练与微调间的数据差距,从而帮助模型更快收敛并获得更优性能。我们在四项音乐理解任务上评估所提方法,结果表明我们的方法在所有任务中均表现优异。模型代码已公开于https://github.com/RS2002/Adversarial-MidiBERT。