The presence of toxic and gender-identity derogatory language in open-source software (OSS) communities has recently become a focal point for researchers. Such comments not only lead to frustration and disengagement among developers but may also influence their leave from the OSS projects. Despite ample evidence suggesting that diverse teams enhance productivity, the existence of toxic or gender identity discriminatory communications poses a significant threat to the participation of individuals from marginalized groups and, as such, may act as a barrier to fostering diversity and inclusion in OSS projects. However, there is a notable lack of research dedicated to exploring the association between gender-based toxic and derogatory language with a perceptible diversity of open-source software teams. Consequently, this study aims to investigate how such content influences the gender, ethnicity, and tenure diversity of open-source software development teams. To achieve this, we extract data from active GitHub projects, assess various project characteristics, and identify instances of toxic and gender-discriminatory language within issue/pull request comments. Using these attributes, we construct a regression model to explore how they associate with the perceptible diversity of those projects.
翻译:开源软件社区中有毒及性别歧视性语言的存在近期已成为研究者的关注焦点。此类言论不仅导致开发者产生挫败感与参与度下降,还可能影响其退出开源项目。尽管大量证据表明多元团队能提升生产力,但有毒或性别歧视性交流的存在对边缘群体成员的参与构成重大威胁,并可能成为阻碍开源项目多元性与包容性的障碍。然而,目前鲜有研究专门探索基于性别的有毒及贬损性语言与开源软件团队可感知多样性之间的关联。因此,本研究旨在调查此类内容如何影响开源软件开发团队的性别、种族及任职期限多样性。为实现该目标,我们提取活跃GitHub项目数据,评估各类项目特征,并在议题/拉取请求评论中识别有毒及性别歧视性语言实例。基于这些属性,我们构建回归模型以探究其与项目可感知多样性的关联。