We employ Natural Language Processing techniques to analyse 377808 English song lyrics from the "Two Million Song Database" corpus, focusing on the expression of sexism across five decades (1960-2010) and the measurement of gender biases. Using a sexism classifier, we identify sexist lyrics at a larger scale than previous studies using small samples of manually annotated popular songs. Furthermore, we reveal gender biases by measuring associations in word embeddings learned on song lyrics. We find sexist content to increase across time, especially from male artists and for popular songs appearing in Billboard charts. Songs are also shown to contain different language biases depending on the gender of the performer, with male solo artist songs containing more and stronger biases. This is the first large scale analysis of this type, giving insights into language usage in such an influential part of popular culture.
翻译:我们运用自然语言处理技术,对“两百万歌曲数据库”语料库中的377808首英文歌词展开分析,重点考察了五个十年间(1960-2010年)性别歧视的表达方式并量化了性别偏见。通过使用性别歧视分类器,我们以远超此前基于人工标注流行歌曲小样本研究的规模,识别出具有性别歧视内容的歌词。此外,我们通过测量基于歌词学习的词嵌入中的关联性,揭示了性别偏见。研究发现,性别歧视内容随时间推移呈上升趋势,尤其是男性艺人作品及登上公告牌榜单的流行歌曲。同时,歌词的语言偏见因表演者性别而异:男性独唱艺人的歌曲呈现出更多且更强烈的偏见。这是首次开展此类大规模分析,为理解流行文化这一重要领域中语言使用的规律提供了洞见。