Automatic text summarization has achieved high performance in high-resourced languages like English, but comparatively less attention has been given to summarization in less-resourced languages. This work compares a variety of different approaches to summarization from zero-shot prompting of LLMs large and small to fine-tuning smaller models like mT5 with and without three data augmentation approaches and multilingual transfer. We also explore an LLM translation pipeline approach, translating from the source language to English, summarizing and translating back. Evaluating with five different metrics, we find that there is variation across LLMs in their performance across similar parameter sizes, that our multilingual fine-tuned mT5 baseline outperforms most other approaches including zero-shot LLM performance for most metrics, and that LLM as judge may be less reliable on less-resourced languages.
翻译:自动文本摘要技术在高资源语言(如英语)中已取得优异性能,但对低资源语言摘要任务的关注相对不足。本研究系统比较了多种摘要方法:从大规模与小规模语言模型的零样本提示,到结合三种数据增强策略与多语言迁移的mT5等小型模型微调方法,并探索了基于LLM的翻译流水线方法(将源语言文本翻译为英语、执行摘要、再回译至源语言)。通过五项评估指标的综合分析,我们发现:相似参数量级的LLM在不同指标上存在性能差异;经多语言微调的mT5基线模型在多数指标上优于包括零样本LLM在内的大多数方法;针对低资源语言,基于LLM的自动评估机制可能存在可靠性不足的问题。