On December 7, 2020, Ghanaians participated in the polls to determine their president for the next four years. To gain insights from this presidential election, we conducted stance analysis (which is not always equivalent to sentiment analysis) to understand how Twitter, a popular social media platform, reflected the opinions of its users regarding the two main presidential candidates. We collected a total of 99,356 tweets using the Twitter API (Tweepy) and manually annotated 3,090 tweets into three classes: Against, Neutral, and Support. We then performed preprocessing on the tweets. The resulting dataset was evaluated using two lexicon-based approaches, VADER and TextBlob, as well as five supervised machine learning-based approaches: Support Vector Machine (SVM), Logistic Regression (LR), Multinomial Na\"ive Bayes (MNB), Stochastic Gradient Descent (SGD), and Random Forest (RF), based on metrics such as accuracy, precision, recall, and F1-score. The best performance was achieved by Logistic Regression with an accuracy of 71.13%. We utilized Logistic Regression to classify all the extracted tweets and subsequently conducted an analysis and discussion of the results. For access to our data and code, please visit: https://github.com/ShesterG/Stance-Detection-Ghana-2020-Elections.git
翻译:2020年12月7日,加纳民众通过投票选举未来四年的总统。为深入理解此次总统选举,我们开展了立场分析(该分析并非总是等同于情感分析),旨在探究推特这一热门社交媒体平台如何反映用户对两位主要总统候选人的观点。我们通过推特API(Tweepy)共收集了99,356条推文,并人工标注了3,090条推文,将其分为三类:反对、中立和支持。随后对推文进行了预处理。基于准确率、精确率、召回率和F1分数等指标,我们采用两种基于词典的方法(VADER和TextBlob)以及五种基于监督机器学习的方法(支持向量机、逻辑回归、多项式朴素贝叶斯、随机梯度下降及随机森林)对所得数据集进行了评估。逻辑回归以71.13%的准确率取得了最佳表现。我们利用逻辑回归对所有提取的推文进行分类,并后续对结果进行了分析与讨论。如需获取我们的数据和代码,请访问:https://github.com/ShesterG/Stance-Detection-Ghana-2020-Elections.git