In the wake of the surging tide of deep learning over the past decade, Automatic Speech Recognition (ASR) has garnered substantial attention, leading to the emergence of numerous publicly accessible ASR systems that are actively being integrated into our daily lives. Nonetheless, the impartial and replicable evaluation of these ASR systems encounters challenges due to various crucial subtleties. In this paper we introduce the SpeechColab Leaderboard, a general-purpose, open-source platform designed for ASR evaluation. With this platform: (i) We report a comprehensive benchmark, unveiling the current state-of-the-art panorama for ASR systems, covering both open-source models and industrial commercial services. (ii) We quantize how distinct nuances in the scoring pipeline influence the final benchmark outcomes. These include nuances related to capitalization, punctuation, interjection, contraction, synonym usage, compound words, etc. These issues have gained prominence in the context of the transition towards an End-to-End future. (iii) We propose a practical modification to the conventional Token-Error-Rate (TER) evaluation metric, with inspirations from Kolmogorov complexity and Normalized Information Distance (NID). This adaptation, called modified-TER (mTER), achieves proper normalization and symmetrical treatment of reference and hypothesis. By leveraging this platform as a large-scale testing ground, this study demonstrates the robustness and backward compatibility of mTER when compared to TER. The SpeechColab Leaderboard is accessible at https://github.com/SpeechColab/Leaderboard
翻译:在近十年深度学习浪潮的推动下,自动语音识别(ASR)引起了广泛关注,催生了众多公开可用的ASR系统,这些系统正日益融入我们的日常生活。然而,由于各种关键细节的差异,对这些ASR系统的公正且可复现的评估面临挑战。本文介绍了SpeechColab排行榜,一个专为ASR评估设计的通用开源平台。借助该平台:(i)我们报告了一个全面的基准测试,揭示了ASR系统(涵盖开源模型与工业商业服务)的当前最优全景图;(ii)我们量化了评分流程中的不同细节对最终基准结果的影响,这些细节包括大小写、标点、感叹词、缩写、同义词使用、复合词等。在向端到端未来过渡的背景下,这些问题日益突出;(iii)受Kolmogorov复杂度和归一化信息距离(NID)的启发,我们对传统的词错误率(TER)评估指标提出了实用改进。这一改进指标——修正TER(mTER)——实现了参考文本与假设文本的适当归一化和对称处理。通过将该平台作为大规模测试场,本研究展示了mTER相较于TER的鲁棒性和向后兼容性。SpeechColab排行榜可在https://github.com/SpeechColab/Leaderboard 获取。