LLM-based speaker diarization correction: A generalizable approach

Speaker diarization is necessary for interpreting conversations transcribed using automated speech recognition (ASR) tools. Despite significant developments in diarization methods, diarization accuracy remains an issue. Here, we investigate the use of large language models (LLMs) for diarization correction as a post-processing step. LLMs were fine-tuned using the Fisher corpus, a large dataset of transcribed conversations. The ability of the models to improve diarization accuracy in a holdout dataset from the Fisher corpus as well as an independent dataset was measured. We report that fine-tuned LLMs can markedly improve diarization accuracy. However, model performance is constrained to transcripts produced using the same ASR tool as the transcripts used for fine-tuning, limiting generalizability. To address this constraint, an ensemble model was developed by combining weights from three separate models, each fine-tuned using transcripts from a different ASR tool. The ensemble model demonstrated better overall performance than each of the ASR-specific models, suggesting that a generalizable and ASR-agnostic approach may be achievable. We have made the weights of these models publicly available on HuggingFace at https://huggingface.co/bklynhlth.

翻译：说话人日志对于解读使用自动语音识别工具转录的对话是必要的。尽管说话人日志方法已取得显著进展，但其准确性仍是一个问题。本文研究了将大型语言模型作为后处理步骤用于说话人日志校正的方法。我们使用Fisher语料库（一个大型对话转录数据集）对LLM进行了微调。模型在Fisher语料库的保留数据集以及独立数据集上提升说话人日志准确性的能力得到了评估。实验表明，经过微调的LLM能够显著提高说话人日志的准确性。然而，模型性能受限于与微调所用转录文本采用相同ASR工具生成的转录本，这限制了其泛化能力。为解决这一局限，我们通过整合三个独立模型的权重开发了集成模型，每个模型分别使用不同ASR工具生成的转录本进行微调。该集成模型展现出比各ASR专用模型更优的整体性能，表明实现可泛化且与ASR无关的方法是可行的。我们已将模型权重公开发布于HuggingFace平台：https://huggingface.co/bklynhlth。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日