Video captioning in Nepali, a language written in the Devanagari script, presents a unique challenge due to the lack of existing academic work in this domain. This work develops a novel encoder-decoder paradigm for Nepali video captioning to tackle this difficulty. LSTM and GRU sequence-to-sequence models are used in the model to produce related textual descriptions based on features retrieved from video frames using CNNs. Using Google Translate and manual post-editing, a Nepali video captioning dataset is generated from the Microsoft Research Video Description Corpus (MSVD) dataset created using Google Translate, and manual post-editing work. The efficacy of the model for Devanagari-scripted video captioning is demonstrated by BLEU, METOR, and ROUGE measures, which are used to assess its performance.
翻译:尼泊尔语(一种使用天城文书写的语言)的视频描述任务因缺乏现有学术研究而面临独特挑战。本研究针对这一难点,提出了一种新颖的编解码范式用于尼泊尔语视频描述。模型采用LSTM和GRU序列到序列架构,基于卷积神经网络从视频帧中提取的特征生成对应的文本描述。通过谷歌翻译与人工后期编辑相结合的方式,基于微软研究视频描述语料库生成了尼泊尔语视频描述数据集。采用BLEU、METEOR和ROUGE指标评估模型性能,实验结果表明该模型在天城文视频描述任务中具有显著有效性。