MeetEval is an open-source toolkit to evaluate all kinds of meeting transcription systems. It provides a unified interface for the computation of commonly used Word Error Rates (WERs), specifically cpWER, ORC WER and MIMO WER along other WER definitions. We extend the cpWER computation by a temporal constraint to ensure that only words are identified as correct when the temporal alignment is plausible. This leads to a better quality of the matching of the hypothesis string to the reference string that more closely resembles the actual transcription quality, and a system is penalized if it provides poor time annotations. Since word-level timing information is often not available, we present a way to approximate exact word-level timings from segment-level timings (e.g., a sentence) and show that the approximation leads to a similar WER as a matching with exact word-level annotations. At the same time, the time constraint leads to a speedup of the matching algorithm, which outweighs the additional overhead caused by processing the time stamps.
翻译:MeetEval 是一个用于评估各类会议转录系统的开源工具包。它提供了统一接口,用于计算常用的词错误率(WER),具体包括 cpWER、ORC WER 和 MIMO WER 及其他 WER 定义。我们在 cpWER 计算中引入了时间约束,确保仅当时间对齐合理时,词汇才能被判定为正确。这一机制使得假设字符串与参考字符串的匹配质量更贴近实际转录质量,若系统提供的时间标注质量较差则会受到惩罚。鉴于词汇级时间信息通常不可获取,我们提出了一种从片段级时间信息(如句子)近似词汇级时间信息的方法,并证明该近似方法可得到与准确词汇级标注匹配相似的 WER。同时,时间约束加快了匹配算法的速度,其效果抵消了处理时间戳带来的额外开销。