While summarization has been extensively researched in natural language processing (NLP), cross-lingual cross-temporal summarization (CLCTS) is a largely unexplored area that has the potential to improve cross-cultural accessibility, information sharing, and understanding. This paper comprehensively addresses the CLCTS task, including dataset creation, modeling, and evaluation. We build the first CLCTS corpus, leveraging historical fictive texts and Wikipedia summaries in English and German, and examine the effectiveness of popular transformer end-to-end models with different intermediate task finetuning tasks. Additionally, we explore the potential of ChatGPT for CLCTS as a summarizer and an evaluator. Overall, we report evaluations from humans, ChatGPT, and several recent automatic evaluation metrics where we find our intermediate task finetuned end-to-end models generate bad to moderate quality summaries; ChatGPT as a summarizer (without any finetuning) provides moderate to good quality outputs and as an evaluator correlates moderately with human evaluations though it is prone to giving lower scores. ChatGPT also seems to be very adept at normalizing historical text. We finally test ChatGPT in a scenario with adversarially attacked and unseen source documents and find that ChatGPT is better at omission and entity swap than negating against its prior knowledge.
翻译:尽管摘要生成在自然语言处理(NLP)领域已被广泛研究,但跨语言跨时间摘要(CLCTS)仍是一个尚未充分探索的领域,其具有提升跨文化可访问性、信息共享与理解的潜力。本文全面研究了CLCTS任务,涵盖数据集构建、建模与评估。我们利用英语和德语的历史虚构文本及维基百科摘要,构建了首个CLCTS语料库,并考察了流行Transformer端到端模型在不同中间任务微调下的有效性。此外,我们还探索了ChatGPT在CLCTS中作为摘要生成器和评估器的潜力。整体上,我们报告了来自人类、ChatGPT以及多种最新自动评估指标的评估结果:中间任务微调的端到端模型生成的摘要质量由差至中等;作为摘要生成器的ChatGPT(未经任何微调)能提供中等至良好质量的输出,而作为评估器时虽与人类评估呈中等相关性,但倾向于给出较低分数。ChatGPT在规范化历史文本方面也表现出色。最后,我们在对抗性攻击和未见过的源文档场景中测试ChatGPT,发现其在遗漏和实体替换方面优于与先验知识相矛盾的场景。