Code Reviews in Open Source Projects : How Do Gender Biases Affect Participation and Outcomes?

Context: Contemporary software development organizations lack diversity and the ratios of women in Free and open-source software (FOSS) communities are even lower than the industry average. Although the results of recent studies hint the existence of biases against women, it is unclear to what extent such biases influence the outcomes of various software development tasks. Aim: We aim to identify whether the outcomes of or participation in code reviews (or pull requests) are influenced by the gender of a developer.. Approach: With this goal, this study includes a total 1010 FOSS projects. We developed six regression models for each of the 14 dataset (i.e., 10 Gerrit based and four Github) to identify if code acceptance, review intervals, and code review participation differ based on the gender and gender neutral profile of a developer. Key findings: Our results find significant gender biases during code acceptance among 13 out of the 14 datasets, with seven seven favoring men and the remaining six favoring women. We also found significant differences between men and women in terms of code review intervals, with women encountering longer delays in three cases and the opposite in seven. Our results indicate reviewer selection as one of the most gender biased aspects among most of the projects, with women having significantly lower code review participation among 11 out of the 14 cases. Since most of the review assignments are based on invitations, this result suggests possible affinity biases among the developers. Conclusion: Though gender bias exists among many projects, direction and amplitude of bias varies based on project size, community and culture. Similar bias mitigation strategies may not work across all communities, as characteristics of biases and their underlying causes differ.

翻译：背景：当代软件开发组织缺乏多样性，自由及开源软件（FOSS）社区中女性的比例甚至低于行业平均水平。尽管近期研究结果暗示存在针对女性的偏见，但尚不清楚这种偏见在多大程度上影响各类软件开发任务的结果。目的：本文旨在识别代码审查（或拉取请求）的参与程度及结果是否受开发者性别影响。方法：为此，本研究共纳入1010个FOSS项目。针对14个数据集（包括10个基于Gerrit的数据集和4个基于GitHub的数据集），我们分别构建了六个回归模型，以分析代码接受率、审查时长和代码审查参与度是否因开发者的性别及性别中立资料而异。主要发现：在14个数据集中，13个显示代码接受率存在显著性别偏见：其中7个数据集偏向男性，其余6个偏向女性。在审查时长方面，女性面临更长延迟的有3个案例，而男性面临更长延迟的有7个案例，表明性别间存在显著差异。研究结果还表明，在大多数项目中，审查者选择是性别偏见最显著的环节之一：14个案例中有11个案例的女性代码审查参与度显著较低。由于多数审查任务基于邀请分配，这一结果提示开发者间可能存在亲和性偏见。结论：尽管性别偏见普遍存在于众多项目中，但其方向和程度因项目规模、社区及文化而异。由于偏见特征及其根本成因各不相同，类似的偏见缓解策略可能无法适用于所有社区。