COVID-19 has brought about many changes in social dynamics. Stay-at-home orders and disruptions in school teaching can influence bullying behavior in-person and online, both of which leading to negative outcomes in victims. To study cyberbullying specifically, 1 million tweets containing keywords associated with abuse were collected from the beginning of 2019 to the end of 2021 with the Twitter API search endpoint. A natural language processing model pre-trained on a Twitter corpus generated probabilities for the tweets being offensive and hateful. To overcome limitations of sampling, data was also collected using the count endpoint. The fraction of tweets from a given daily sample marked as abusive is multiplied to the number reported by the count endpoint. Once these adjusted counts are assembled, a Bayesian autoregressive Poisson model allows one to study the mean trend and lag functions of the data and how they vary over time. The results reveal strong weekly and yearly seasonality in hateful speech but with slight differences across years that may be attributed to COVID-19.
翻译:新冠疫情带来了社会动态的诸多变化。居家令及学校教学的中断可能影响线下与线上的欺凌行为,而这两种形式均会对受害者造成负面后果。为专门研究网络欺凌,本研究通过Twitter API搜索端点收集了2019年初至2021年底期间包含辱骂相关关键词的100万条推文。基于Twitter语料库预训练的自然语言处理模型生成了推文被标记为攻击性或仇恨性的概率。为克服抽样局限性,还利用计数端点收集了数据。将每日样本中被标记为辱骂的推文比例乘以计数端点报告的数量,从而得到调整后的计数。基于这些调整后的计数,采用贝叶斯自回归泊松模型可研究数据的均值趋势、滞后函数及其随时间的变化模式。结果显示仇恨言论存在显著的每周与年度季节性规律,但不同年份间存在细微差异,这可能与新冠疫情相关。