Research on cross-dialectal transfer from a standard to a non-standard dialect variety has typically focused on text data. However, dialects are primarily spoken, and non-standard spellings cause issues in text processing. We compare standard-to-dialect transfer in three settings: text models, speech models, and cascaded systems where speech first gets automatically transcribed and then further processed by a text model. We focus on German dialects in the context of written and spoken intent classification -- releasing the first dialectal audio intent classification dataset -- with supporting experiments on topic classification. The speech-only setup provides the best results on the dialect data while the text-only setup works best on the standard data. While the cascaded systems lag behind the text-only models for German, they perform relatively well on the dialectal data if the transcription system generates normalized, standard-like output.
翻译:关于从标准语到非标准方言变体的跨方言迁移研究通常聚焦于文本数据。然而,方言主要作为口语形式存在,且非标准拼写会引发文本处理问题。我们比较了三种场景下的标准语到方言迁移:纯文本模型、纯语音模型以及级联系统(语音先自动转写,再由文本模型处理)。以书面与口语意图分类为背景(首次发布方言语音意图分类数据集,并辅以主题分类实验),我们重点研究了德语方言。实验表明:纯语音设置在方言数据上表现最优,而纯文本设置在标准数据上表现最佳。尽管级联系统在德语上的表现落后于纯文本模型,但当转写系统生成规范化、类标准输出时,其方言数据处理效果相对较好。