The advent of large language models (LLMs) has significantly advanced natural language processing tasks like text summarization. However, their large size and computational demands, coupled with privacy concerns in data transmission, limit their use in resource-constrained and privacy-centric settings. To overcome this, we introduce TriSum, a framework for distilling LLMs' text summarization abilities into a compact, local model. Initially, LLMs extract a set of aspect-triple rationales and summaries, which are refined using a dual-scoring method for quality. Next, a smaller local model is trained with these tasks, employing a curriculum learning strategy that evolves from simple to complex tasks. Our method enhances local model performance on various benchmarks (CNN/DailyMail, XSum, and ClinicalTrial), outperforming baselines by 4.5%, 8.5%, and 7.4%, respectively. It also improves interpretability by providing insights into the summarization rationale.
翻译:大语言模型(LLMs)的出现显著推进了文本摘要等自然语言处理任务。然而,其庞大的参数量和计算资源消耗,以及数据传输过程中的隐私问题,限制了其在资源受限和隐私敏感场景下的应用。为克服这一挑战,我们提出TriSum框架——一种将大语言模型的文本摘要能力蒸馏至轻量级本地模型的方案。首先,LLMs提取一组面向方面的三元组依据与摘要,并通过双重评分方法进行质量优化。随后,采用从简单到复杂的课程学习策略,训练包含这些任务的轻量级本地模型。该方法在CNN/DailyMail、XSum和ClinicalTrial基准测试中分别将本地模型的性能提升4.5%、8.5%和7.4%,并通过提供摘要生成依据的可解释性分析,显著增强了模型的可解释性。