The ILSUM shared task focuses on text summarization for two major Indian languages- Hindi and Gujarati, along with English. In this task, we experiment with various pretrained sequence-to-sequence models to find out the best model for each of the languages. We present a detailed overview of the models and our approaches in this paper. We secure the first rank across all three sub-tasks (English, Hindi and Gujarati). This paper also extensively analyzes the impact of k-fold cross-validation while experimenting with limited data size, and we also perform various experiments with a combination of the original and a filtered version of the data to determine the efficacy of the pretrained models.
翻译:ILSUM共享任务聚焦于两种主要印度语言——印地语和古吉拉特语,以及英语的文本摘要。在本任务中,我们尝试了多种预训练的序列到序列模型,以找出每种语言的最佳模型。本文详细概述了所用模型及我们的方法。我们在所有三个子任务(英语、印地语和古吉拉特语)中均获得第一名。本文还深入分析了在有限数据规模下进行实验时k折交叉验证的影响,并通过结合原始数据与过滤后数据进行的多种实验,评估了预训练模型的有效性。