LLMs are increasingly used to boost productivity and support software engineering tasks. However, when applied to socially sensitive decisions such as team composition and task allocation, they raise concerns of fairness. Prior studies have revealed that LLMs may reproduce stereotypes; however, these analyses remain exploratory and examine sensitive attributes in isolation. This study investigates whether LLMs exhibit bias in team composition and task assignment by analyzing the combined effects of candidates' country and pronouns. Using three LLMs and 3,000 simulated decisions, we find systematic disparities: demographic attributes significantly shaped both selection likelihood and task allocation, even when accounting for expertise-related factors. Task distributions further reflected stereotypes, with technical and leadership roles unevenly assigned across groups. Our findings indicate that LLMs exacerbate demographic inequities in software engineering contexts, underscoring the need for fairness-aware assessment.
翻译:大型语言模型(LLM)正日益广泛地应用于提升生产力及支持软件工程任务。然而,当将其应用于团队组建和任务分配等社会敏感性决策时,其公平性问题引发了广泛担忧。已有研究表明,LLM可能重现刻板印象;然而,这些分析仍处于探索阶段,且多孤立地考察单一敏感属性。本研究通过分析候选人国籍与代词称谓的复合效应,探究LLM在团队组成与任务分配中是否存在偏见。基于三种LLM模型和3000次模拟决策的数据分析,我们发现了系统性差异:即使在控制专业能力相关因素后,人口统计学属性仍显著影响候选人的入选概率及任务分配。任务分布进一步反映出刻板印象,技术类与领导类职位在不同群体间呈现不均衡分配。我们的研究结果表明,LLM会加剧软件工程领域的人口统计学不平等,这凸显了开展公平性评估的迫切需求。