The presence of toxic and gender-identity derogatory language in open-source software (OSS) communities has recently become a focal point for researchers. Such comments not only lead to frustration and disengagement among developers but may also influence their leave from the OSS projects. Despite ample evidence suggesting that diverse teams enhance productivity, the existence of toxic or gender identity discriminatory communications poses a significant threat to the participation of individuals from marginalized groups and, as such, may act as a barrier to fostering diversity and inclusion in OSS projects. However, there is a notable lack of research dedicated to exploring the association between gender-based toxic and derogatory language with a perceptible diversity of open-source software teams. Consequently, this study aims to investigate how such content influences the gender, ethnicity, and tenure diversity of open-source software development teams. To achieve this, we extract data from active GitHub projects, assess various project characteristics, and identify instances of toxic and gender-discriminatory language within issue/pull request comments. Using these attributes, we construct a regression model to explore how they associate with the perceptible diversity of those projects.
翻译:开源软件(OSS)社区中有毒及性别认同贬损性语言的存在,近期已成为研究者的关注焦点。此类评论不仅导致开发者产生挫败感与参与度下降,更可能促使其退出开源项目。尽管大量证据表明多元化团队能提升生产力,但有毒或性别歧视性沟通的存在对边缘化群体成员的参与构成重大威胁,进而可能成为阻碍开源项目多元包容性发展的障碍。然而,目前尚缺乏专门探讨基于性别的有毒与贬损性语言与开源软件团队可感知多样性之间关联的研究。为此,本研究旨在揭示此类内容如何影响开源软件开发团队在性别、种族及任期维度上的多样性。我们通过提取活跃GitHub项目数据,评估各类项目特征,并在议题/拉取请求评论中识别有毒与性别歧视性语言实例。基于这些属性,我们构建回归模型以探究其与项目可感知多样性之间的关联。