With the rapid growth of the use of social media websites, obtaining the users' feedback automatically became a crucial task to evaluate their tendencies and behaviors online. Despite this great availability of information, and the increasing number of Arabic users only few research has managed to treat Arabic dialects. The purpose of this paper is to study the opinion and emotion expressed in real Moroccan texts precisely in the YouTube comments using some well-known and commonly used methods for sentiment analysis. In this paper, we present our work of Moroccan dialect comments classification using Machine Learning (ML) models and based on our collected and manually annotated YouTube Moroccan dialect dataset. By employing many text preprocessing and data representation techniques we aim to compare our classification results utilizing the most commonly used supervised classifiers: k-nearest neighbors (KNN), Support Vector Machine (SVM), Naive Bayes (NB), and deep learning (DL) classifiers such as Convolutional Neural Network (CNN) and Long Short-Term Memory (LTSM). Experiments were performed using both raw and preprocessed data to show the importance of the preprocessing. In fact, the experimental results prove that DL models have a better performance for Moroccan Dialect than classical approaches and we achieved an accuracy of 90%.
翻译:随着社交媒体网站使用量的快速增长,自动获取用户反馈成为评估其在线倾向与行为的关键任务。尽管信息高度可获取且阿拉伯语用户数量不断增加,但仅有少数研究成功处理了阿拉伯方言。本文旨在研究摩洛哥语真实文本(具体为YouTube评论)中表达的观点和情感,采用若干公认且常用的情感分析方法。基于我们收集并人工标注的YouTube摩洛哥方言数据集,本文展示了使用机器学习(ML)模型进行摩洛哥方言评论分类的工作。通过运用多种文本预处理与数据表示技术,我们旨在比较最常用的监督分类器:k近邻(KNN)、支持向量机(SVM)、朴素贝叶斯(NB)以及深度学习(DL)分类器(如卷积神经网络(CNN)和长短期记忆网络(LSTM))的分类结果。实验采用原始数据与预处理数据分别进行,以证明预处理的重要性。事实上,实验结果表明,深度学习模型在摩洛哥方言上的表现优于传统方法,我们实现了90%的准确率。