On December 7, 2020, Ghanaians participated in the polls to determine their president for the next four years. To gain insights from this presidential election, we conducted stance analysis (which is not always equivalent to sentiment analysis) to understand how Twitter, a popular social media platform, reflected the opinions of its users regarding the two main presidential candidates. We collected a total of 99,356 tweets using the Twitter API (Tweepy) and manually annotated 3,090 tweets into three classes: Against, Neutral, and Support. We then performed preprocessing on the tweets. The resulting dataset was evaluated using two lexicon-based approaches, VADER and TextBlob, as well as five supervised machine learning-based approaches: Support Vector Machine (SVM), Logistic Regression (LR), Multinomial Na\"ive Bayes (MNB), Stochastic Gradient Descent (SGD), and Random Forest (RF), based on metrics such as accuracy, precision, recall, and F1-score. The best performance was achieved by Logistic Regression with an accuracy of 71.13%. We utilized Logistic Regression to classify all the extracted tweets and subsequently conducted an analysis and discussion of the results. For access to our data and code, please visit: https://github.com/ShesterG/Stance-Detection-Ghana-2020-Elections.git
翻译:2020年12月7日,加纳民众参与投票以决定未来四年的总统人选。为深入洞察此次总统选举,我们开展了立场分析(该分析并不等同于情感分析),以探究推特这一流行的社交媒体平台如何反映用户对两位主要总统候选人的观点。通过推特API(Tweepy)共采集了99,356条推文,并对其中3,090条推文进行了人工标注,将其分为三类:反对、中立和支持。随后我们对推文进行了预处理。基于该数据集,我们采用两种基于词典的方法(VADER和TextBlob)以及五种基于监督机器学习的方法(支持向量机(SVM)、逻辑回归(LR)、多项式朴素贝叶斯(MNB)、随机梯度下降(SGD)和随机森林(RF)),以准确率、精确率、召回率和F1分数为指标进行评估。其中,逻辑回归表现最佳,准确率达到71.13%。我们利用逻辑回归对全部采集的推文进行分类,并进一步对结果进行了分析与讨论。如需访问我们的数据和代码,请访问:https://github.com/ShesterG/Stance-Detection-Ghana-2020-Elections.git