Social media offers a unique lens to observe large-scale, spatial-temporal patterns of users reactions toward critical events. However, social media use varies across demographics, with younger users being more prevalent compared to older populations. This difference introduces biases in data representativeness, and analysis based on social media without proper adjustment will lead to overlooking the voices of digitally marginalized communities and inaccurate estimations. This study explores solutions to pinpoint and alleviate the demographic biases in social media analysis through a case study estimating the public sentiment about COVID-19 using Twitter data. We analyzed the pandemic-related Twitter data in the U.S. during 2020-2021 to (1) elucidate the uneven social media usage among demographic groups and the disparities of their sentiments toward COVID-19, (2) construct an adjusted public sentiment measurement based on social media, the Sentiment Adjusted by Demographics (SAD) index, to evaluate the spatiotemporal varying public sentiment toward COVID-19. The results show higher proportions of female and adolescent Twitter users expressing negative emotions to COVID-19. The SAD index unveils that the public sentiment toward COVID-19 was most negative in January and February 2020 and most positive in April 2020. Vermont and Wyoming were the most positive and negative states toward COVID-19.
翻译:社交媒体为观察用户对关键事件的大规模时空反应模式提供了独特视角。然而,不同人群的社交媒体使用存在差异,年轻用户的使用比例显著高于老年群体。这种差异导致数据代表性偏差,未经适当校正的社交媒体分析将忽视数字边缘群体的声音,并产生不准确的估计。本研究通过使用Twitter数据估算公众对COVID-19情绪的案例,探索定位和缓解社交媒体分析中人口统计偏差的解决方案。我们分析了2020-2021年间美国疫情相关Twitter数据,旨在:(1)阐明人口群体间社交媒体使用的不均衡性及其对COVID-19情绪差异;(2)构建基于社交媒体、经人口统计校正的情绪测量指标——人口统计校正情绪指数(SAD),以评估公众对COVID-19情绪的时空变化特征。结果表明,女性及青少年Twitter用户表达COVID-19负面情绪的比例更高。SAD指数揭示,公众对COVID-19的情绪在2020年1-2月最为消极,2020年4月最为积极。佛蒙特州和怀俄明州分别是对COVID-19情绪最积极和最消极的州。