In doctor-patient conversations, identifying medically relevant information is crucial, posing the need for conversation summarization. In this work, we propose the first deployable real-time speech summarization system for real-world applications in industry, which generates a local summary after every N speech utterances within a conversation and a global summary after the end of a conversation. Our system could enhance user experience from a business standpoint, while also reducing computational costs from a technical perspective. Secondly, we present VietMed-Sum which, to our knowledge, is the first speech summarization dataset for medical conversations. Thirdly, we are the first to utilize LLM and human annotators collaboratively to create gold standard and synthetic summaries for medical conversation summarization. Finally, we present baseline results of state-of-the-art models on VietMed-Sum. All code, data (English-translated and Vietnamese) and models are available online: https://github.com/leduckhai/MultiMed
翻译:在医患对话中,识别医学相关信息至关重要,这凸显了对话摘要的必要性。本研究提出了首个可用于工业界实际应用的可部署实时语音摘要系统,该系统能在对话中每N段语音话语后生成局部摘要,并在对话结束后生成全局摘要。从商业角度看,我们的系统能提升用户体验;从技术视角而言,它同时降低了计算成本。其次,我们发布了VietMed-Sum数据集——据我们所知,这是首个面向医疗对话的语音摘要数据集。第三,我们首次采用大型语言模型与人工标注者协同工作的方式,为医疗对话摘要任务创建了黄金标准摘要与合成摘要。最后,我们在VietMed-Sum数据集上给出了当前最先进模型的基线实验结果。所有代码、数据(英文翻译版与越南语版)及模型均已开源:https://github.com/leduckhai/MultiMed