'Mahabharata' is the most popular among many Indian pieces of literature referred to in many domains for completely different purposes. This text itself is having various dimension and aspects which is useful for the human being in their personal life and professional life. This Indian Epic is originally written in the Sanskrit Language. Now in the era of Natural Language Processing, Artificial Intelligence, Machine Learning, and Human-Computer interaction this text can be processed according to the domain requirement. It is interesting to process this text and get useful insights from Mahabharata. The limitation of the humans while analyzing Mahabharata is that they always have a sentiment aspect towards the story narrated by the author. Apart from that, the human cannot memorize statistical or computational details, like which two words are frequently coming in one sentence? What is the average length of the sentences across the whole literature? Which word is the most popular word across the text, what are the lemmas of the words used across the sentences? Thus, in this paper, we propose an NLP pipeline to get some statistical and computational insights along with the most relevant word searching method from the largest epic 'Mahabharata'. We stacked the different text-processing approaches to articulate the best results which can be further used in the various domain where Mahabharata needs to be referred.
翻译:《摩诃婆罗多》是印度众多文学作品中最为流行的一部,被不同领域出于截然不同的目的所引用。该文本本身具有多维度和多方面,对人类个人生活和职业生活均有益处。这部印度史诗最初以梵文撰写。如今,在自然语言处理、人工智能、机器学习和人机交互的时代,可以根据领域需求对该文本进行处理。处理该文本并从《摩诃婆罗多》中获取有用见解颇具趣味。人类在分析《摩诃婆罗多》时的局限性在于,他们总是对作者所叙述的故事带有情感倾向。此外,人类无法记忆统计或计算细节,例如哪两个词经常出现在同一句子中?整部文献中句子的平均长度是多少?哪个词是文本中出现频率最高的词?句子中使用的词的词元是什么?因此,本文提出了一种自然语言处理流水线,以获取一些统计和计算见解,以及从最长史诗《摩诃婆罗多》中搜索最相关词汇的方法。我们堆叠了不同的文本处理方法,以优化结果,这些结果可进一步用于需要引用《摩诃婆罗多》的各个领域。