Given a document in a source language, cross-lingual summarization (CLS) aims to generate a summary in a different target language. Recently, the emergence of Large Language Models (LLMs), such as GPT-3.5, ChatGPT and GPT-4, has attracted wide attention from the computational linguistics community. However, it is not yet known the performance of LLMs on CLS. In this report, we empirically use various prompts to guide LLMs to perform zero-shot CLS from different paradigms (i.e., end-to-end and pipeline), and provide a preliminary evaluation on the generated summaries. We find that ChatGPT and GPT-4 originally prefer to produce lengthy summaries with detailed information. These two LLMs can further balance informativeness and conciseness with the help of an interactive prompt, significantly improving their CLS performance. Experimental results on three widely-used CLS datasets show that GPT-4 achieves state-of-the-art zero-shot CLS performance, and performs competitively compared with the fine-tuned mBART-50. Moreover, we also find some multi-lingual and bilingual LLMs (i.e., BLOOMZ, ChatGLM-6B, Vicuna-13B and ChatYuan) have limited zero-shot CLS ability. Due to the composite nature of CLS, which requires models to perform summarization and translation simultaneously, accomplishing this task in a zero-shot manner is even a challenge for LLMs. Therefore, we sincerely hope and recommend future LLM research could use CLS as a testbed.
翻译:跨语言摘要(CLS)旨在将源语言文档生成为不同目标语言的摘要。近年来,GPT-3.5、ChatGPT和GPT-4等大型语言模型(LLMs)的出现引起了计算语言学界的广泛关注。然而,LLMs在跨语言摘要任务中的表现尚不明确。本报告通过设计多种提示策略,引导LLMs从不同范式(端到端与流水线)执行零样本跨语言摘要任务,并对生成的摘要进行了初步评估。研究发现,ChatGPT和GPT-4倾向于生成包含详细信息的冗长摘要。通过引入交互式提示,这两个LLMs能够进一步平衡信息量与简洁性,显著提升其跨语言摘要性能。在三个广泛使用的跨语言摘要数据集上的实验结果表明,GPT-4实现了最优的零样本跨语言摘要性能,并与经微调的mBART-50模型表现相当。此外,我们还发现部分多语言和双语LLMs(如BLOOMZ、ChatGLM-6B、Vicuna-13B和ChatYuan)的零样本跨语言摘要能力有限。由于跨语言摘要任务具有复合特性,要求模型同时执行摘要生成与翻译任务,这使得以零样本方式完成该任务对LLMs而言仍具挑战性。因此,我们诚挚期望并建议未来LLMs研究将跨语言摘要作为测试基准。