Introduction: Microblogging websites have massed rich data sources for sentiment analysis and opinion mining. In this regard, sentiment classification has frequently proven inefficient because microblog posts typically lack syntactically consistent terms and representatives since users on these social networks do not like to write lengthy statements. Also, there are some limitations to low-resource languages. The Persian language has exceptional characteristics and demands unique annotated data and models for the sentiment analysis task, which are distinctive from text features within the English dialect. Method: This paper first constructs a user opinion dataset called ITRC-Opinion by collaborative environment and insource way. Our dataset contains 60,000 informal and colloquial Persian texts from social microblogs such as Twitter and Instagram. Second, this study proposes a new deep convolutional neural network (CNN) model for more effective sentiment analysis of colloquial text in social microblog posts. The constructed datasets are used to evaluate the presented model. Furthermore, some models, such as LSTM, CNN-RNN, BiLSTM, and BiGRU with different word embeddings, including Fasttext, Glove, and Word2vec, investigated our dataset and evaluated the results. Results: The results demonstrate the benefit of our dataset and the proposed model (72% accuracy), displaying meaningful improvement in sentiment classification performance.
翻译:引言:微博平台积累了用于情感分析和意见挖掘的丰富数据资源。然而,情感分类通常效率较低,因为微博用户倾向于使用简短表述,导致帖子缺乏句法一致的术语和代表性。此外,低资源语言也存在一定限制。波斯语具有独特特征,需要适用于情感分析任务的特定标注数据和模型,这与英语文本特征存在差异。方法:本文首先通过协作环境和内部方式构建了名为ITRC-Opinion的用户意见数据集,包含来自Twitter和Instagram等社交微博的60,000条非正式波斯语俚语文本。随后,本研究提出一种新型深度卷积神经网络(CNN)模型,用于更有效地分析社交微博帖子中的俚语文本情感。该数据集被用于评估所提出的模型,同时我们还采用包含Fasttext、Glove和Word2vec等不同词向量的LSTM、CNN-RNN、BiLSTM和BiGRU等模型对数据集进行测试与结果评估。结果:实验结果表明,本数据集及所提模型(准确率72%)在情感分类性能上实现了显著提升。