This paper describes the systems developed by SPRING Lab, Indian Institute of Technology Madras, for the ASRU MADASR 2.0 challenge. The systems developed focuses on adapting ASR systems to improve in predicting the language and dialect of the utterance among 8 languages across 33 dialects. We participated in Track 1 and Track 2, which restricts the use of additional data and develop from-the-scratch multilingual systems. We presented a novel training approach using Multi-Decoder architecture with phonemic Common Label Set (CLS) as intermediate representation. It improved the performance over the baseline (in the CLS space). We also discuss various methods used to retain the gain obtained in the phonemic space while converting them back to the corresponding grapheme representations. Our systems beat the baseline in 3 languages (Track 2) in terms of WER/CER and achieved the highest language ID and dialect ID accuracy among all participating teams (Track 2).
翻译:本文介绍了印度马德拉斯理工学院SPRING实验室为ASRU MADASR 2.0挑战赛开发的系统。该系统专注于改进自动语音识别系统,以提升其在33种方言、涵盖8种语言的语音中预测语言和方言的能力。我们参与了赛道1和赛道2,这两个赛道限制使用额外数据,并要求从头构建多语言系统。我们提出了一种新颖的训练方法,采用多解码器架构,并以音素通用标签集作为中间表示。该方法在基线模型(在CLS空间内)的基础上提升了性能。我们还讨论了多种方法,用于在将音素表示转换回对应字素表示时,保留在音素空间中获得的效果增益。我们的系统在3种语言(赛道2)上以词错误率/字错误率指标超越了基线,并在所有参赛队伍中(赛道2)取得了最高的语言识别和方言识别准确率。