Research on cross-dialectal transfer from a standard to a non-standard dialect variety has typically focused on text data. However, dialects are primarily spoken, and non-standard spellings cause issues in text processing. We compare standard-to-dialect transfer in three settings: text models, speech models, and cascaded systems where speech first gets automatically transcribed and then further processed by a text model. We focus on German dialects in the context of written and spoken intent classification -- releasing the first dialectal audio intent classification dataset -- with supporting experiments on topic classification. The speech-only setup provides the best results on the dialect data while the text-only setup works best on the standard data. While the cascaded systems lag behind the text-only models for German, they perform relatively well on the dialectal data if the transcription system generates normalized, standard-like output.
翻译:从标准语到非标准方言变体的跨方言迁移研究通常聚焦于文本数据。然而,方言主要以口语形式存在,非标准拼写会为文本处理带来困难。我们在三种设定下比较标准语至方言的迁移效果:文本模型、语音模型以及级联系统(语音先经自动转写,再由文本模型进一步处理)。我们以书面与口语意图分类为背景研究德语方言——首次发布了方言音频意图分类数据集——并辅以主题分类的补充实验。纯语音设定在方言数据上取得最佳结果,而纯文本设定在标准数据上表现最优。尽管级联系统在德语标准语上落后于纯文本模型,但若转写系统能生成规范化、类标准语的输出,其在方言数据上的表现相对较好。