What Does a Software Engineer Look Like? Exploring Societal Stereotypes in LLMs

Large language models (LLMs) have rapidly gained popularity and are being embedded into professional applications due to their capabilities in generating human-like content. However, unquestioned reliance on their outputs and recommendations can be problematic as LLMs can reinforce societal biases and stereotypes. This study investigates how LLMs, specifically OpenAI's GPT-4 and Microsoft Copilot, can reinforce gender and racial stereotypes within the software engineering (SE) profession through both textual and graphical outputs. We used each LLM to generate 300 profiles, consisting of 100 gender-based and 50 gender-neutral profiles, for a recruitment scenario in SE roles. Recommendations were generated for each profile and evaluated against the job requirements for four distinct SE positions. Each LLM was asked to select the top 5 candidates and subsequently the best candidate for each role. Each LLM was also asked to generate images for the top 5 candidates, providing a dataset for analysing potential biases in both text-based selections and visual representations. Our analysis reveals that both models preferred male and Caucasian profiles, particularly for senior roles, and favoured images featuring traits such as lighter skin tones, slimmer body types, and younger appearances. These findings highlight underlying societal biases influence the outputs of LLMs, contributing to narrow, exclusionary stereotypes that can further limit diversity and perpetuate inequities in the SE field. As LLMs are increasingly adopted within SE research and professional practices, awareness of these biases is crucial to prevent the reinforcement of discriminatory norms and to ensure that AI tools are leveraged to promote an inclusive and equitable engineering culture rather than hinder it.

翻译：大语言模型（LLMs）因其生成类人内容的能力而迅速普及，并被嵌入专业应用中。然而，对其输出和推荐不加批判地依赖可能存在问题，因为LLMs可能强化社会偏见和刻板印象。本研究探讨了LLMs，特别是OpenAI的GPT-4和Microsoft Copilot，如何通过文本和图形输出强化软件工程（SE）职业中的性别和种族刻板印象。我们使用每个LLM为SE职位的招聘场景生成了300份个人资料，包括100份基于性别的资料和50份性别中立的资料。针对每份资料生成推荐，并根据四个不同SE职位的职位要求进行评估。要求每个LLM为每个职位选择前5名候选人，随后选择最佳候选人。同时要求每个LLM为前5名候选人生成图像，从而为分析基于文本的选择和视觉表征中的潜在偏见提供了数据集。我们的分析表明，两种模型都更偏好男性和白种人资料，尤其是高级职位，并且青睐具有较浅肤色、较瘦体型和较年轻外貌特征的图像。这些发现突显了潜在的社会偏见如何影响LLMs的输出，助长了狭隘、排他的刻板印象，可能进一步限制SE领域的多样性并延续不平等。随着LLMs在SE研究和专业实践中日益普及，认识这些偏见对于防止强化歧视性规范、确保利用AI工具促进包容和公平的工程文化而非阻碍其发展至关重要。

相关内容

Engineering

关注 6

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日