Online social networks offer vast opportunities for computational social science, but effective user embedding is crucial for downstream tasks. Traditionally, researchers have used pre-defined network-based user features, such as degree, and centrality measures, and/or content-based features, such as posts and reposts. However, these measures may not capture the complex characteristics of social media users. In this study, we propose a user embedding method based on the URL domain co-occurrence network, which is simple but effective for representing social media users in competing events. We assessed the performance of this method in binary classification tasks using benchmark datasets that included Twitter users related to COVID-19 infodemic topics (QAnon, Biden, Ivermectin). Our results revealed that user embeddings generated directly from the retweet network, and those based on language, performed below expectations. In contrast, our domain-based embeddings outperformed these methods while reducing computation time. These findings suggest that the domain-based user embedding can serve as an effective tool to characterize social media users participating in competing events, such as political campaigns and public health crises.
翻译:在线社交网络为计算社会科学提供了广阔机遇,但有效的用户嵌入对下游任务至关重要。传统上,研究者使用预定义的基于网络的用户特征(如度中心性、介数中心性等)和/或基于内容的特征(如帖子与转发)。然而,这些度量可能无法捕捉社交媒体用户的复杂特征。本研究提出一种基于URL域共现网络的用户嵌入方法,该方法简单有效,适用于竞争事件中的社交媒体用户表征。我们使用包含与COVID-19信息疫情主题(QAnon、拜登、伊维菌素)相关的Twitter用户基准数据集,在二分类任务中评估了该方法的表现。结果显示,直接从转发网络生成的用户嵌入以及基于语言的嵌入表现未达预期,而我们的域嵌入方法在缩短计算时间的同时优于这些方法。这些发现表明,基于域的用户嵌入可作为表征参与竞争事件(如政治运动与公共卫生危机)的社交媒体用户的有效工具。