Community Question-Answering platforms, such as Stack Overflow (SO), are valuable knowledge exchange and problem-solving resources. These platforms incorporate mechanisms to assess the quality of answers and participants' expertise, ideally free from discriminatory biases. However, prior research has highlighted persistent gender biases, raising concerns about the inclusivity and fairness of these systems. Addressing such biases is crucial for fostering equitable online communities. While previous studies focus on detecting gender bias by comparing male and female user characteristics, they often overlook the interaction between genders, inherent answer quality, and the selection of ``best answers'' by question askers. In this study, we investigate whether answer quality is influenced by gender using a combination of human evaluations and automated assessments powered by Large Language Models. Our findings reveal no significant gender differences in answer quality, nor any substantial influence of gender bias on the selection of ``best answers." Instead, we find that the significant gender disparities in SO's reputation scores are primarily attributable to differences in users' activity levels, e.g., the number of questions and answers they write. Our results have important implications for the design of scoring systems in community question-answering platforms. In particular, reputation systems that heavily emphasize activity volume risk amplifying gender disparities that do not reflect actual differences in answer quality, calling for more equitable design strategies.
翻译:社区问答平台(如Stack Overflow)是宝贵的知识交流与问题解决资源。这类平台通过机制评估答案质量与参与者专业水平,理想情况下应避免歧视性偏差。然而,现有研究揭示了持续存在的性别偏见,引发了对平台包容性与公平性的担忧。消除此类偏见对构建公平的在线社区至关重要。既有研究多通过对比男女用户特征来检测性别偏见,却常忽视性别间的互动关系、答案内在质量以及提问者对"最佳答案"的选择机制。本研究结合人工评估与基于大语言模型的自动评估方法,探究答案质量是否受性别因素影响。研究发现:答案质量未呈现显著性别差异,性别偏见对"最佳答案"的选择亦无实质性影响。相反,StackOverflow声誉评分的显著性别差异主要源于用户活跃度差异,例如提问与回答的数量。本研究对社区问答平台评分系统设计具有重要启示:过度强调活跃度的声誉系统可能放大与答案质量无关的性别差异,亟需更具公平性的设计策略。