This paper explores the rapid development of a telephone call summarization system utilizing large language models (LLMs). Our approach involves initial experiments with prompting existing LLMs to generate summaries of telephone conversations, followed by the creation of a tailored synthetic training dataset utilizing stronger frontier models. We place special focus on the diversity of the generated data and on the ability to control the length of the generated summaries to meet various use-case specific requirements. The effectiveness of our method is evaluated using two state-of-the-art LLM-as-a-judge-based evaluation techniques to ensure the quality and relevance of the summaries. Our results show that fine-tuned Llama-2-7B-based summarization model performs on-par with GPT-4 in terms of factual accuracy, completeness and conciseness. Our findings demonstrate the potential for quickly bootstrapping a practical and efficient call summarization system.
翻译:本文探讨了利用大型语言模型(LLM)快速开发电话通话摘要系统的方法。我们的研究首先通过提示现有LLM生成电话对话摘要进行初步实验,随后利用性能更强的前沿模型构建了定制的合成训练数据集。我们特别关注生成数据的多样性,以及控制生成摘要长度的能力,以满足不同用例的具体需求。我们采用两种基于LLM即评判器的最新评估技术来验证方法的有效性,以确保摘要的质量与相关性。实验结果表明,基于Llama-2-7B微调的摘要模型在事实准确性、完整性和简洁性方面与GPT-4表现相当。我们的研究证明了快速构建实用高效通话摘要系统的潜力。