Commentary provides readers with a deep understanding of events by presenting diverse arguments and evidence. However, creating commentary is a time-consuming task, even for skilled commentators. Large language models (LLMs) have simplified the process of natural language generation, but their direct application in commentary creation still faces challenges due to unique task requirements. These requirements can be categorized into two levels: 1) fundamental requirements, which include creating well-structured and logically consistent narratives, and 2) advanced requirements, which involve generating quality arguments and providing convincing evidence. In this paper, we introduce Xinyu, an efficient LLM-based system designed to assist commentators in generating Chinese commentaries. To meet the fundamental requirements, we deconstruct the generation process into sequential steps, proposing targeted strategies and supervised fine-tuning (SFT) for each step. To address the advanced requirements, we present an argument ranking model for arguments and establish a comprehensive evidence database that includes up-to-date events and classic books, thereby strengthening the substantiation of the evidence with retrieval augmented generation (RAG) technology. To evaluate the generated commentaries more fairly, corresponding to the two-level requirements, we introduce a comprehensive evaluation metric that considers five distinct perspectives in commentary generation. Our experiments confirm the effectiveness of our proposed system. We also observe a significant increase in the efficiency of commentators in real-world scenarios, with the average time spent on creating a commentary dropping from 4 hours to 20 minutes. Importantly, such an increase in efficiency does not compromise the quality of the commentaries.
翻译:评论通过呈现多样化的论点与证据,为读者提供对事件的深入理解。然而,即使对于熟练的评论员而言,创作评论也是一项耗时的工作。大型语言模型(LLMs)简化了自然语言生成的过程,但由于评论创作任务的独特需求,其直接应用仍面临挑战。这些需求可分为两个层面:1)基础需求,包括构建结构良好且逻辑一致的叙述;2)高级需求,涉及生成高质量的论点并提供令人信服的证据。本文介绍新语(Xinyu),一个基于LLM的高效系统,旨在辅助评论员生成中文评论。为满足基础需求,我们将生成过程解构为顺序步骤,并为每一步提出针对性策略及监督微调(SFT)。为应对高级需求,我们提出了一个论点排序模型,并建立了一个包含最新事件与经典著作的综合性证据数据库,从而通过检索增强生成(RAG)技术强化证据的支撑力。为了更公平地评估生成的评论,我们对应两级需求,引入了一个综合考虑评论生成中五个不同维度的综合评价指标。实验证实了我们所提系统的有效性。在实际应用场景中,我们观察到评论员的工作效率显著提升,创作一篇评论的平均耗时从4小时降至20分钟。重要的是,效率的提升并未损害评论的质量。