Social divide and polarization have become significant societal issues. To understand the mechanisms behind these phenomena, social media analysis offers research opportunities in computational social science, where developing effective user embedding methods is essential for subsequent analysis. Traditionally, researchers have used predefined network-based user features (e.g., network size, degree, and centrality measures). However, because such measures may not capture the complex characteristics of social media users, in our study we developed a method for embedding users based on a URL domain co-occurrence network. This approach effectively represents social media users involved in competing events such as political campaigns and public health crises. We assessed the method's performance using binary classification tasks and datasets that covered topics associated with the COVID-19 infodemic, such as QAnon, Biden, and Ivermectin, among Twitter users. Our results revealed that user embeddings generated directly from the retweet network and/or based on language performed below expectations, whereas our domain-based embeddings outperformed those methods while reducing computation time. Therefore, domain-based embedding offers an accessible and effective method for characterizing social media users in competing events.
翻译:社会分裂与极化已成为重要的社会问题。为理解这些现象背后的机制,社交媒体分析为计算社会科学提供了研究契机,其中开发有效的用户嵌入方法对后续分析至关重要。传统上,研究者常采用预定义的基于网络的用户特征(如网络规模、度中心性及各类中心性度量)。然而,此类度量可能无法充分捕捉社交媒体用户的复杂特征,因此本研究开发了一种基于URL域名共现网络的用户嵌入方法。该方法能有效表征参与政治竞选、公共卫生危机等竞争事件的社交媒体用户。我们通过二分类任务及涵盖COVID-19信息疫情相关主题(如QAnon、拜登、伊维菌素)的Twitter用户数据集评估了该方法的性能。结果显示,直接基于转发网络和/或语言特征生成的用户嵌入表现未达预期,而本文提出的领域嵌入方法在显著减少计算时间的同时取得了更优性能。因此,基于领域的嵌入为竞争事件中的社交媒体用户表征提供了一种高效易行的解决方案。