This report introduces Dolphin, a large-scale multilingual automatic speech recognition (ASR) model that extends the Whisper architecture to support a wider range of languages. Our approach integrates in-house proprietary and open-source datasets to refine and optimize Dolphin's performance. The model is specifically designed to achieve notable recognition accuracy for 40 Eastern languages across East Asia, South Asia, Southeast Asia, and the Middle East, while also supporting 22 Chinese dialects. Experimental evaluations show that Dolphin significantly outperforms current state-of-the-art open-source models across various languages. To promote reproducibility and community-driven innovation, we are making our trained models and inference source code publicly available.
翻译:本报告介绍了海豚(Dolphin),一种大规模多语言自动语音识别(ASR)模型,该模型扩展了Whisper架构以支持更广泛的语言范围。我们的方法整合了内部专有数据集与开源数据集,以优化并提升海豚模型的性能。该模型专门设计用于在东亚、南亚、东南亚及中东地区的40种东方语言上实现显著的识别准确率,同时支持22种汉语方言。实验评估表明,海豚在多种语言上的表现显著优于当前最先进的开源模型。为促进研究的可复现性及社区驱动的创新,我们将公开训练好的模型及推理源代码。