Background: The COVID-19 pandemic has affected our society and human well-being in various ways. In this study, we investigate how the pandemic has influenced people's emotions and psychological states compared to a pre-pandemic period using real-world data from social media. Method: We collected Reddit social media data from 2019 (pre-pandemic) and 2020 (pandemic) from the subreddits communities associated with eight universities. We applied the pre-trained Robustly Optimized BERT pre-training approach (RoBERTa) to learn text embedding from the Reddit messages, and leveraged the relational information among posted messages to train a graph attention network (GAT) for sentiment classification. Finally, we applied model stacking to combine the prediction probabilities from RoBERTa and GAT to yield the final classification on sentiment. With the model-predicted sentiment labels on the collected data, we used a generalized linear mixed-effects model to estimate the effects of pandemic and in-person teaching during the pandemic on sentiment. Results: The results suggest that the odds of negative sentiments in 2020 (pandemic) were 25.7% higher than the odds in 2019 (pre-pandemic) with a $p$-value $<0.001$; and the odds of negative sentiments associated in-person learning were 48.3% higher than with remote learning in 2020 with a $p$-value of 0.029. Conclusions: Our study results are consistent with the findings in the literature on the negative impacts of the pandemic on people's emotions and psychological states. Our study contributes to the growing real-world evidence on the various negative impacts of the pandemic on our society; it also provides a good example of using both ML techniques and statistical modeling and inference to make better use of real-world data.
翻译:背景:新冠疫情以多种方式影响着我们的社会与人类福祉。本研究利用社交媒体真实数据,探讨相较于疫情前时期,疫情如何影响人们的情绪与心理状态。方法:我们收集了2019年(疫情前)和2020年(疫情期间)Reddit平台上八所大学相关子论坛的社交媒体数据。采用预训练的鲁棒优化BERT预训练方法(RoBERTa)从Reddit帖子中学习文本嵌入,并利用帖子间的关联信息训练图注意力网络(GAT)进行情感分类。最后,通过模型堆叠融合RoBERTa与GAT的预测概率,得出最终情感分类结果。基于模型预测的情感标签,我们使用广义线性混合效应模型估算疫情期间及面对面教学对情感的影响。结果:结果表明,2020年(疫情期间)负面情感的几率比2019年(疫情前)高出25.7%($p$-值 $<0.001$);2020年疫情期间,面对面教学相关的负面情感几率比远程教学高出48.3%($p$-值=0.029)。结论:本研究结果与文献中关于疫情对人们情绪和心理状态产生负面影响的结论一致。我们的研究为疫情对社会造成的多维度负面影响提供了日益丰富的真实世界证据,同时也为结合机器学习技术与统计建模推理以更有效利用真实世界数据提供了良好范例。