Text summarization is the process of condensing a piece of text to fewer sentences, while still preserving its content. Chat transcript, in this context, is a textual copy of a digital or online conversation between a customer (caller) and agent(s). This paper presents an indigenously (locally) developed hybrid method that first combines extractive and abstractive summarization techniques in compressing ill-punctuated or un-punctuated chat transcripts to produce more readable punctuated summaries and then optimizes the overall quality of summarization through reinforcement learning. Extensive testing, evaluations, comparisons, and validation have demonstrated the efficacy of this approach for large-scale deployment of chat transcript summarization, in the absence of manually generated reference (annotated) summaries.
翻译:文本摘要是将一段文本压缩为更少句子同时保留其内容的过程。在此语境中,聊天记录是客户(呼叫方)与客服人员之间数字或在线对话的文本副本。本文提出了一种本地开发的混合方法,该方法首先结合抽取式与生成式摘要技术,对缺乏标点或标点不规范的聊天记录进行压缩,生成更易读的带标点摘要;随后通过强化学习优化整体摘要质量。大量测试、评估、比较与验证表明,即使在没有人工生成的参考摘要(标注摘要)的情况下,该方法在聊天记录摘要的大规模部署场景中仍展现出有效性。