Social media (SM) platforms (e.g. Facebook, Twitter, and Reddit) are increasingly leveraged to share opinions and emotions, specifically during challenging events, such as natural disasters, pandemics, and political elections, and joyful occasions like festivals and celebrations. Among the SM platforms, Reddit provides a unique space for its users to anonymously express their experiences and thoughts on sensitive issues such as health and daily life. In this work, we present a novel dataset, called NepEMO, for multi-label emotion (MLE) and sentiment classification (SC) on the Nepali subreddit post. We curate and build a manually annotated dataset of 4,462 posts (January 2019- June 2025) written in English, Romanised Nepali and Devanagari script for five emotions (fear, anger, sadness, joy, and depression) and three sentiment classes (positive, negative, and neutral). We perform a detailed analysis of posts to capture linguistic insights, including emotion trends, co-occurrence of emotions, sentiment-specific n-grams, and topic modelling using Latent Dirichlet Allocation and TF-IDF keyword extraction. Finally, we compare various traditional machine learning (ML), deep learning (DL), and transformer models for MLE and SC tasks. The result shows that transformer models consistently outperform the ML and DL models for both tasks.
翻译:社交媒体平台(如Facebook、Twitter和Reddit)日益成为用户分享观点与情绪的重要渠道,特别是在自然灾害、疫情、政治选举等挑战性事件,以及节日庆典等欢乐场合中。在众多社交媒体平台中,Reddit为用户提供了匿名表达健康、日常生活等敏感话题经历与想法的独特空间。本研究提出了一个名为NepEMO的新型数据集,用于尼泊尔语Reddit帖子的多标签情绪分类与情感分类任务。我们构建并标注了包含4,462条帖子(时间跨度为2019年1月至2025年6月)的手工标注数据集,文本涵盖英语、罗马化尼泊尔语及天城文字符,标注体系包含五种情绪(恐惧、愤怒、悲伤、喜悦、抑郁)与三种情感类别(积极、消极、中性)。通过对帖子进行细粒度分析,我们揭示了包括情绪趋势、情绪共现、情感特异性n-元组在内的语言特征,并运用潜在狄利克雷分布主题建模与TF-IDF关键词提取方法进行主题分析。最后,我们系统比较了传统机器学习模型、深度学习模型以及Transformer模型在多标签情绪分类与情感分类任务上的性能。实验结果表明,Transformer模型在两项任务中均持续优于传统机器学习与深度学习模型。