Dialect differences caused by regional, social, and economic factors cause performance discrepancies for many groups of language technology users. Inclusive and equitable language technology must critically be dialect invariant, meaning that performance remains constant over dialectal shifts. Current systems often fall short of this ideal since they are designed and tested on a single dialect: Standard American English (SAE). We introduce a suite of resources for evaluating and achieving English dialect invariance. The resource is called Multi-VALUE, a controllable rule-based translation system spanning 50 English dialects and 189 unique linguistic features. Multi-VALUE maps SAE to synthetic forms of each dialect. First, we use this system to stress tests question answering, machine translation, and semantic parsing. Stress tests reveal significant performance disparities for leading models on non-standard dialects. Second, we use this system as a data augmentation technique to improve the dialect robustness of existing systems. Finally, we partner with native speakers of Chicano and Indian English to release new gold-standard variants of the popular CoQA task. To execute the transformation code, run model checkpoints, and download both synthetic and gold-standard dialectal benchmark datasets, see http://value-nlp.org.
翻译:由地区、社会和经济因素引起的方言差异导致许多语言技术用户群体的性能差异。包容且公平的语言技术必须关键性地实现方言不变性,即性能在方言变化中保持不变。当前系统往往未能达到这一理想状态,因为它们仅基于单一方言——标准美国英语(SAE)进行设计和测试。我们引入了一套用于评估和实现英语方言不变性的资源。该资源名为Multi-VALUE,是一个可控的基于规则的翻译系统,涵盖50种英语方言和189个独特语言特征。Multi-VALUE将SAE映射为每种方言的合成形式。首先,我们使用该系统对问答、机器翻译和语义解析进行压力测试。压力测试揭示了主流模型在非标准方言上的显著性能差异。其次,我们利用该系统作为数据增强技术,以提升现有系统的方言鲁棒性。最后,我们与奇卡诺英语和印度英语母语者合作,发布了流行的CoQA任务的新金标准变体。要执行转换代码、运行模型检查点以及下载合成和金标准方言基准数据集,请访问http://value-nlp.org。