Can we trust AI to detect healthy multilingual English speakers among the cognitively impaired cohort in the UK? An investigation using real-world conversational speech

翻译：我们能否信任人工智能在英国认知障碍人群中识别健康的多语种英语使用者？基于真实世界对话语音的调查研究

Madhurananda Pahar,Caitlin Illingworth,Dorota Braun,Bahman Mirheidari,Lise Sproson,Daniel Blackburn,Heidi Christensen

Conversational speech often reveals early signs of cognitive decline, such as dementia and MCI. In the UK, one in four people belongs to an ethnic minority, and dementia prevalence is expected to rise most rapidly among Black and Asian communities. This study examines the trustworthiness of AI models, specifically the presence of bias, in detecting healthy multilingual English speakers among the cognitively impaired cohort, to make these tools clinically beneficial. For experiments, monolingual participants were recruited nationally (UK), and multilingual speakers were enrolled from four community centres in Sheffield and Bradford. In addition to a non-native English accent, multilinguals spoke Somali, Chinese, or South Asian languages, who were further divided into two Yorkshire accents (West and South) to challenge the efficiency of the AI tools thoroughly. Although ASR systems showed no significant bias across groups, classification and regression models using acoustic and linguistic features exhibited bias against multilingual speakers, particularly in memory, fluency, and reading tasks. This bias was more pronounced when models were trained on the publicly available DementiaBank dataset. Moreover, multilinguals were more likely to be misclassified as having cognitive decline. This study is the first of its kind to discover that, despite their strong overall performance, current AI models show bias against multilingual individuals from ethnic minority backgrounds in the UK, and they are also more likely to misclassify speakers with a certain accent (South Yorkshire) as living with a more severe cognitive decline. In this pilot study, we conclude that the existing AI tools are therefore not yet reliable for diagnostic use in these populations, and we aim to address this in future work by developing more generalisable, bias-mitigated models.

翻译：对话语音常能揭示认知衰退的早期迹象，如痴呆症和轻度认知障碍。在英国，四分之一的人口属于少数族裔，且痴呆症患病率预计在黑人及亚裔社区中增长最为迅速。本研究旨在检验人工智能模型在认知障碍人群中识别健康多语种英语使用者时的可信度，特别是其中存在的偏见问题，以使这些工具具备临床实用价值。实验中，单语参与者在全国范围（英国）招募，多语种参与者则从谢菲尔德和布拉德福德的四个社区中心征集。除非母语英语口音外，多语种参与者使用索马里语、汉语或南亚语言，并进一步分为两种约克郡口音（西部与南部），以全面检验人工智能工具的效能。尽管自动语音识别系统在各组间未表现出显著偏见，但使用声学和语言特征的分类与回归模型显示出对多语种参与者的偏见，尤其在记忆、流利度和阅读任务中。当模型基于公开的DementiaBank数据集训练时，这种偏见更为明显。此外，多语种参与者被误判为存在认知衰退的概率更高。本研究首次发现：尽管现有人工智能模型整体性能强劲，但对英国少数族裔背景的多语种个体存在偏见，且更易将特定口音（南约克郡）使用者误判为患有更严重的认知衰退。本试点研究得出结论：现有人工智能工具尚不可靠用于这些群体的诊断应用，未来工作将通过开发更具泛化能力、能缓解偏见的模型来解决这一问题。