In this study, we present SeMaScore, generated using a segment-wise mapping and scoring algorithm that serves as an evaluation metric for automatic speech recognition tasks. SeMaScore leverages both the error rate and a more robust similarity score. We show that our algorithm's score generation improves upon the state-of-the-art BERTScore. Our experimental results show that SeMaScore corresponds well with expert human assessments, signal-to-noise ratio levels, and other natural language metrics. We outperform BERTScore by 41x in metric computation speed. Overall, we demonstrate that SeMaScore serves as a more dependable evaluation metric, particularly in real-world situations involving atypical speech patterns.
翻译:本研究提出SeMaScore,该指标通过分段映射与评分算法生成,可作为自动语音识别任务的评估标准。SeMaScore综合了错误率与更鲁棒的相似度评分。我们证明该算法的分数生成效果优于当前最先进的BERTScore。实验结果表明,SeMaScore与专家人工评估、信噪比水平及其他自然语言评估指标具有良好的一致性。在指标计算速度上,我们以41倍优势超越BERTScore。总体而言,我们证实SeMaScore可作为更可靠的评估指标,尤其适用于涉及非典型语音模式的现实场景。