This paper presents a novel evaluation approach to text-based speaker diarization (SD), tackling the limitations of traditional metrics that do not account for any contextual information in text. Two new metrics are proposed, Text-based Diarization Error Rate and Diarization F1, which perform utterance- and word-level evaluations by aligning tokens in reference and hypothesis transcripts. Our metrics encompass more types of errors compared to existing ones, allowing us to make a more comprehensive analysis in SD. To align tokens, a multiple sequence alignment algorithm is introduced that supports multiple sequences in the reference while handling high-dimensional alignment to the hypothesis using dynamic programming. Our work is packaged into two tools, align4d providing an API for our alignment algorithm and TranscribeView for visualizing and evaluating SD errors, which can greatly aid in the creation of high-quality data, fostering the advancement of dialogue systems.
翻译:本文提出了一种针对文本化说话人日志(SD)的新型评估方法,旨在解决传统指标忽略文本上下文信息的局限性。我们提出了两种新指标——基于文本的说话人日志错误率和日志F1分数,通过在参考转录与假设转录中对齐词元实现语句级和单词级评估。相较于现有指标,我们的指标涵盖更多类型的错误,从而能够对SD进行更全面的分析。为实现词元对齐,引入了一种多重序列对齐算法,该算法支持参考序列中的多重序列,并通过动态规划处理与假设序列的高维对齐。我们将研究成果封装为两个工具:align4d提供了对齐算法的应用程序编程接口,TranscribeView则用于可视化与评估SD错误,这些工具可显著助力高质量数据的创建,推动对话系统的发展。