We examine the representation of African American English (AAE) in large language models (LLMs), exploring (a) the perceptions Black Americans have of how effective these technologies are at producing authentic AAE, and (b) in what contexts Black Americans find this desirable. Through both a survey of Black Americans ($n=$ 104) and annotation of LLM-produced AAE by Black Americans ($n=$ 228), we find that Black Americans favor choice and autonomy in determining when AAE is appropriate in LLM output. They tend to prefer that LLMs default to communicating in Mainstream U.S. English in formal settings, with greater interest in AAE production in less formal settings. When LLMs were appropriately prompted and provided in context examples, our participants found their outputs to have a level of AAE authenticity on par with transcripts of Black American speech. Select code and data for our project can be found here: https://github.com/smelliecat/AAEMime.git
翻译:本研究考察了大型语言模型(LLMs)中非裔美国人英语(AAE)的表征问题,主要探讨:(a)非裔美国人对这些技术生成地道AAE有效性的认知,以及(b)非裔美国人在何种情境下认为这种生成具有价值。通过对非裔美国人的问卷调查(样本量=104)及AAE语料标注(标注者=228位非裔美国人),我们发现非裔美国人更倾向于在LLM输出中自主决定AAE的使用时机。他们普遍希望LLM在正式场合默认使用主流美式英语进行交流,而在非正式场合则对生成AAE表现出更高兴趣。当LLMs获得适当提示并提供上下文示例时,参与者认为其输出的AAE地道程度可与非裔美国人实际语音转录文本相媲美。本项目相关代码与数据可在此获取:https://github.com/smelliecat/AAEMime.git