Speaker diarization systems segment a conversation recording based on the speakers' identity. Such systems can misclassify the speaker of a portion of audio due to a variety of factors, such as speech pattern variation, background noise, and overlapping speech. These errors propagate to, and can adversely affect, downstream systems that rely on the speaker's identity, such as speaker-adapted speech recognition. One of the ways to mitigate these errors is to provide segment-level diarization confidence scores to downstream systems. In this work, we investigate multiple methods for generating diarization confidence scores, including those derived from the original diarization system and those derived from an external model. Our experiments across multiple datasets and diarization systems demonstrate that the most competitive confidence score methods can isolate ~30% of the diarization errors within segments with the lowest ~10% of confidence scores.
翻译:说话人日志系统根据说话人身份对对话录音进行分段。由于语音模式变化、背景噪声和重叠语音等多种因素,此类系统可能错误分类部分音频片段的说话人。这些错误会传播至依赖说话人身份的下游系统(如说话人自适应语音识别),并对其产生不利影响。缓解这些错误的方法之一是为下游系统提供分段级别的日志置信度分数。本研究探讨了多种生成日志置信度分数的方法,包括源自原始日志系统的方法和源自外部模型的方法。我们在多个数据集和日志系统上进行的实验表明,最具竞争力的置信度评分方法能够将约30%的日志错误隔离在置信度分数最低的约10%的片段内。