Summarization of multi-party dialogues is a critical capability in industry, enhancing knowledge transfer and operational effectiveness across many domains. However, automatically generating high-quality summaries is challenging, as the ideal summary must satisfy a set of complex, multi-faceted requirements. While summarization has received immense attention in research, prior work has primarily utilized static datasets and benchmarks, a condition rare in practical scenarios where requirements inevitably evolve. In this work, we present an industry case study on developing an agentic system to summarize multi-party interactions. We share practical insights spanning the full development lifecycle to guide practitioners in building reliable, adaptable summarization systems, as well as to inform future research, covering: 1) robust methods for evaluation despite evolving requirements and task subjectivity, 2) component-wise optimization enabled by the task decomposition inherent in an agentic architecture, 3) the impact of upstream data bottlenecks, and 4) the realities of vendor lock-in due to the poor transferability of LLM prompts.
翻译:多参与者对话的摘要能力在工业界至关重要,它能提升众多领域的知识传递与运营效率。然而,自动生成高质量摘要极具挑战性,因为理想的摘要必须满足一系列复杂、多方面的要求。尽管摘要研究已获得极大关注,但先前工作主要使用静态数据集和基准,这在现实场景中极为罕见,因为实际需求不可避免地会演变。在本工作中,我们展示了一个关于开发用于总结多参与者交互的智能体系统的工业案例研究。我们分享了涵盖完整开发生命周期的实践经验,旨在指导从业者构建可靠、适应性强的摘要系统,并为未来研究提供参考,内容涵盖:1) 在需求演变和任务主观性下依然稳健的评估方法,2) 由智能体架构固有的任务分解所支持的组件级优化,3) 上游数据瓶颈的影响,以及 4) 由于LLM提示词可移植性差而导致的供应商锁定的现实问题。