Given a document in a source language, cross-lingual summarization (CLS) aims to generate a summary in a different target language. Recently, the emergence of Large Language Models (LLMs), such as GPT-3.5, ChatGPT and GPT-4, has attracted wide attention from the computational linguistics community. However, it is not yet known the performance of LLMs on CLS. In this report, we empirically use various prompts to guide LLMs to perform zero-shot CLS from different paradigms (i.e., end-to-end and pipeline), and provide a preliminary evaluation on the generated summaries. We find that ChatGPT and GPT-4 originally prefer to produce lengthy summaries with detailed information. These two LLMs can further balance informativeness and conciseness with the help of an interactive prompt, significantly improving their CLS performance. Experimental results on three widely-used CLS datasets show that GPT-4 achieves state-of-the-art zero-shot CLS performance, and performs competitively compared with the fine-tuned mBART-50. Moreover, we also find some multi-lingual and bilingual LLMs (i.e., BLOOMZ, ChatGLM-6B, Vicuna-13B and ChatYuan) have limited zero-shot CLS ability. Due to the composite nature of CLS, which requires models to perform summarization and translation simultaneously, accomplishing this task in a zero-shot manner is even a challenge for LLMs. Therefore, we sincerely hope and recommend future LLM research could use CLS as a testbed.
翻译:给定源语言文档,跨语言摘要旨在生成不同目标语言的摘要。近期,GPT-3.5、ChatGPT和GPT-4等大语言模型的出现吸引了计算语言学界的广泛关注。然而,LLMs在CLS任务上的表现尚不可知。在本报告中,我们通过实验性方法使用多种提示策略引导LLMs从不同范式(即端到端和流水线)执行零样本CLS,并对生成的摘要进行了初步评估。研究发现,ChatGPT和GPT-4初始倾向于生成包含详细信息的冗长摘要。借助交互式提示,这两个LLMs能够进一步平衡信息性与简洁性,显著提升CLS性能。在三个广泛使用的CLS数据集上的实验结果表明,GPT-4实现了最先进的零样本CLS性能,并与微调后的mBART-50模型表现相当。此外,我们还发现部分多语言及双语LLMs(如BLOOMZ、ChatGLM-6B、Vicuna-13B和ChatYuan)的零样本CLS能力有限。由于CLS任务要求模型同时完成摘要生成与翻译的复合特性,以零样本方式完成该任务对LLMs而言仍是挑战。因此,我们诚挚希望并建议未来的LLM研究可将CLS作为实验基准。