We employ Natural Language Processing techniques to analyse 377808 English song lyrics from the "Two Million Song Database" corpus, focusing on the expression of sexism across five decades (1960-2010) and the measurement of gender biases. Using a sexism classifier, we identify sexist lyrics at a larger scale than previous studies using small samples of manually annotated popular songs. Furthermore, we reveal gender biases by measuring associations in word embeddings learned on song lyrics. We find sexist content to increase across time, especially from male artists and for popular songs appearing in Billboard charts. Songs are also shown to contain different language biases depending on the gender of the performer, with male solo artist songs containing more and stronger biases. This is the first large scale analysis of this type, giving insights into language usage in such an influential part of popular culture.
翻译:我们运用自然语言处理技术,对"两百万歌曲数据库"语料库中377808首英文歌词进行分析,重点关注1960-2010年五十年间性别歧视表达及性别偏见的测量。通过使用性别歧视分类器,我们以超越前人通过小规模人工标注流行歌曲样本的更大规模识别出含有性别歧视的歌词。此外,我们通过测量基于歌词学习的词嵌入中的关联性,揭示了性别偏见。研究发现,性别歧视内容随时间推移呈增长趋势,尤以男性艺术家作品及登榜公告牌排行榜的流行歌曲最为显著。同时,歌词所包含的语言偏见因表演者性别而异,男性独唱艺人的歌曲展现出更多且更强的偏见。这是该领域首次大规模分析,为流行文化中这一极具影响力的组成部分的语言使用提供了深刻见解。