Regressions trained to predict the future activity of social media users need rich features for accurate predictions. Many advanced models exist to generate such features; however, the time complexities of their computations are often prohibitive when they run on enormous data-sets. Some studies have shown that simple semantic network features can be rich enough to use for regressions without requiring complex computations. We propose a method for using semantic networks as user-level features for machine learning tasks. We conducted an experiment using a semantic network of 1037 Twitter hashtags from a corpus of 3.7 million tweets related to the 2022 French presidential election. A bipartite graph is formed where hashtags are nodes and weighted edges connect the hashtags reflecting the number of Twitter users that interacted with both hashtags. The graph is then transformed into a maximum-spanning tree with the most popular hashtag as its root node to construct a hierarchy amongst the hashtags. We then provide a vector feature for each user based on this tree. To validate the usefulness of our semantic feature we performed a regression experiment to predict the response rate of each user with six emotions like anger, enjoyment, or disgust. Our semantic feature performs well with the regression with most emotions having $R^2$ above 0.5. These results suggest that our semantic feature could be considered for use in further experiments predicting social media response on big data-sets.
翻译:用于预测社交媒体用户未来活动的回归模型需要丰富的特征以提高预测准确性。尽管现有多种高级模型可生成此类特征,但其计算时间复杂度在处理海量数据集时往往过高。部分研究表明,简单的语义网络特征足以构建回归模型,且无需复杂计算。我们提出一种将语义网络作为用户级特征用于机器学习任务的方法。基于2022年法国总统选举相关的370万条推文语料库,我们构建了一个包含1037个Twitter话题标签的语义网络实验。该网络采用二分图结构,节点为话题标签,加权边连接反映同时与两个话题标签互动的用户数量。随后将该图转化为以最热门话题标签为根节点的最大生成树,以在话题标签间构建层次关系。基于该树形结构,我们为每位用户生成向量特征。为验证语义特征的有效性,我们通过回归实验预测每位用户对愤怒、愉悦、厌恶等六种情绪的反应率。该语义特征在回归中表现良好,多数情绪的R²值超过0.5。结果表明,该语义特征可考虑应用于大数据集上社交媒体响应预测的后续实验。