Rank Your Summaries: Enhancing Bengali Text Summarization via Ranking-based Approach

With the increasing need for text summarization techniques that are both efficient and accurate, it becomes crucial to explore avenues that enhance the quality and precision of pre-trained models specifically tailored for summarizing Bengali texts. When it comes to text summarization tasks, there are numerous pre-trained transformer models at one's disposal. Consequently, it becomes quite a challenge to discern the most informative and relevant summary for a given text among the various options generated by these pre-trained summarization models. This paper aims to identify the most accurate and informative summary for a given text by utilizing a simple but effective ranking-based approach that compares the output of four different pre-trained Bengali text summarization models. The process begins by carrying out preprocessing of the input text that involves eliminating unnecessary elements such as special characters and punctuation marks. Next, we utilize four pre-trained summarization models to generate summaries, followed by applying a text ranking algorithm to identify the most suitable summary. Ultimately, the summary with the highest ranking score is chosen as the final one. To evaluate the effectiveness of this approach, the generated summaries are compared against human-annotated summaries using standard NLG metrics such as BLEU, ROUGE, BERTScore, WIL, WER, and METEOR. Experimental results suggest that by leveraging the strengths of each pre-trained transformer model and combining them using a ranking-based approach, our methodology significantly improves the accuracy and effectiveness of the Bengali text summarization.

翻译：随着对高效且准确的文本摘要技术需求的日益增长，探索提升专用于孟加拉语文本摘要的预训练模型质量与精度的途径变得至关重要。在文本摘要任务中，拥有众多可供使用的预训练Transformer模型。因此，从这些预训练摘要模型生成的多个选项里辨别出针对给定文本最具信息量且最相关的摘要成为一项挑战。本文旨在通过一种简单却有效的基于排序的方法，对比四种不同孟加拉语文本摘要预训练模型的输出结果，从而为给定文本找出最准确且信息量最大的摘要。该流程首先对输入文本进行预处理，包括去除特殊字符和标点符号等无关元素。接着，利用四个预训练摘要模型生成摘要，随后应用文本排序算法筛选出最合适的摘要。最终，选择排序得分最高的摘要作为最终结果。为评估该方法的效果，将生成的摘要与人工标注的摘要进行比较，采用标准自然语言生成指标，如BLEU、ROUGE、BERTScore、WIL、WER和METEOR。实验结果表明，通过结合各预训练Transformer模型的优势，并借助基于排序的方法整合它们，我们的方法显著提升了孟加拉语文本摘要的准确性与有效性。