Automatic Speech Recognition (ASR) systems are widely used in everyday communication, education, healthcare, and industry, yet their performance remains uneven across speakers, particularly when dialectal variation diverges from the mainstream accents represented in training data. This study investigates ASR bias through a sociolinguistic analysis of Newcastle English, a regional variety of North-East England that has been shown to challenge current speech recognition technologies. Using spontaneous speech from the Diachronic Electronic Corpus of Tyneside English (DECTE), we evaluate the output of a state-of-the-art commercial ASR system and conduct a fine-grained analysis of more than 3,000 transcription errors. Errors are classified by linguistic domain and examined in relation to social variables including gender, age, and socioeconomic status. In addition, an acoustic case study of selected vowel features demonstrates how gradient phonetic variation contributes directly to misrecognition. The results show that phonological variation accounts for the majority of errors, with recurrent failures linked to dialect-specific features like vowel quality and glottalisation, as well as local vocabulary and non-standard grammatical forms. Error rates also vary across social groups, with higher error frequencies observed for men and for speakers at the extremes of the age spectrum. These findings indicate that ASR errors are not random but socially patterned and can be explained from a sociolinguistic perspective. Thus, the study demonstrates the importance of incorporating sociolinguistic expertise into the evaluation and development of speech technologies and argues that more equitable ASR systems require explicit attention to dialectal variation and community-based speech data.
翻译:自动语音识别(ASR)系统广泛应用于日常交流、教育、医疗保健及工业生产领域,但其在不同说话者间的表现仍存在显著差异,尤其是当方言变体偏离训练数据所代表的主流口音时尤为明显。本研究通过对纽卡斯尔英语(英格兰东北部被证实能挑战现有语音识别技术的区域变体)进行社会语言学分析,探讨ASR偏差问题。基于泰恩赛德英语历时电子语料库(DECTE)的自发语音数据,我们评估了一款当前最先进的商业ASR系统的输出,并对超过3000处转录错误进行了精细化分析。这些错误按语言领域分类,并与性别、年龄和社会经济地位等社会变量进行关联考查。此外,针对特定元音特征的声学案例研究表明,梯度性的语音变异会直接导致错误识别。结果显示,音系变体构成了大多数错误的主要来源,这些持续性故障与元音质量和声门化等方言特异性特征,以及本地词汇和非标准语法形式密切相关。错误率在不同社会群体间也存在差异,其中男性和年龄极值段的说话者错误频率更高。上述发现表明,ASR错误并非随机产生,而是具有社会模式性,可从社会语言学视角进行解释。因此,本研究证明了将社会语言学专业知识纳入语音技术评估与开发的重要性,并指出更公平的ASR系统需要明确关注方言变异及基于社区的语音数据。