Cross-lingual summarization (CLS) is a sophisticated branch in Natural Language Processing that demands models to accurately translate and summarize articles from different source languages. Despite the improvement of the subsequent studies, This area still needs data-efficient solutions along with effective training methodologies. To the best of our knowledge, there is no feasible solution for CLS when there is no available high-quality CLS data. In this paper, we propose a novel data-efficient approach, ConVerSum, for CLS leveraging the power of contrastive learning, generating versatile candidate summaries in different languages based on the given source document and contrasting these summaries with reference summaries concerning the given documents. After that, we train the model with a contrastive ranking loss. Then, we rigorously evaluate the proposed approach against current methodologies and compare it to powerful Large Language Models (LLMs)- Gemini, GPT 3.5, and GPT 4o proving our model performs better for low-resource languages' CLS. These findings represent a substantial improvement in the area, opening the door to more efficient and accurate cross-lingual summarizing techniques.
翻译:跨语言摘要(CLS)是自然语言处理中的一个复杂分支,要求模型能够准确翻译并总结不同源语言的文章。尽管后续研究有所改进,但该领域仍需要数据高效的解决方案以及有效的训练方法。据我们所知,当缺乏高质量CLS数据时,目前尚无可行的CLS解决方案。本文提出了一种新颖的数据高效方法ConVerSum,该方法利用对比学习的力量,基于给定源文档生成不同语言的多样化候选摘要,并将这些摘要与参考摘要进行对比。随后,我们使用对比排序损失训练模型。接着,我们严格评估了所提方法,并与当前主流方法以及强大的大语言模型(如Gemini、GPT 3.5和GPT 4o)进行比较,证明我们的模型在低资源语言的CLS任务上表现更优。这些发现代表了该领域的显著进步,为更高效、更准确的跨语言摘要技术开辟了道路。