Remote communication through video or audio conferences has become more popular than ever because of the worldwide pandemic. These events, therefore, have provoked the development of systems for automatic minuting of spoken language leading to AutoMin 2021 challenge. The following paper illustrates the results of the research that team MTS has carried out while participating in the Automatic Minutes challenge. In particular, in this paper we analyze existing approaches to text and speech summarization, propose an unsupervised summarization technique based on clustering and provide a pipeline that includes an adapted automatic speech recognition block able to run on real-life recordings. The proposed unsupervised technique outperforms pre-trained summarization models on the automatic minuting task with Rouge 1, Rouge 2 and Rouge L values of 0.21, 0.02 and 0.2 on the dev set, with Rouge 1, Rouge 2, Rouge L, Adequacy, Grammatical correctness and Fluency values of 0.180, 0.035, 0.098, 1.857, 2.304, 1.911 on the test set accordingly
翻译:由于全球疫情的影响,通过视频或音频会议进行的远程交流变得比以往任何时候都更加普遍。这些活动因此推动了自动口语纪要系统的发展,并催生了AutoMin 2021挑战赛。本文阐述了MTS团队在参与自动纪要挑战赛过程中所进行的研究成果。具体而言,本文分析了现有的文本与语音摘要方法,提出了一种基于聚类的无监督摘要技术,并提供了一个包含适配的自动语音识别模块的流程,该模块能够在真实录音上运行。所提出的无监督技术在自动纪要任务上优于预训练的摘要模型,其在开发集上的Rouge 1、Rouge 2和Rouge L值分别为0.21、0.02和0.2;在测试集上的Rouge 1、Rouge 2、Rouge L、充分性、语法正确性和流畅性值分别为0.180、0.035、0.098、1.857、2.304、1.911。