The dominance of social media has added to the channels of bullying for perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in todays cyber world, and is a severe threat to the mental and physical health of citizens. This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms to manage the impact in our society. Several machine learning (ML) algorithms have been proposed for this purpose. However, their performances are not consistent due to high class imbalance and generalisation issues. In recent years, large language models (LLMs) like BERT and RoBERTa have achieved state-of-the-art (SOTA) results in several natural language processing (NLP) tasks. Unfortunately, the LLMs have not been applied extensively for CB detection. In our paper, we explored the use of these models for cyberbullying (CB) detection. We have prepared a new dataset (D2) from existing studies (Formspring and Twitter). Our experimental results for dataset D1 and D2 showed that RoBERTa outperformed other models.
翻译:社交媒体的主导地位增加了欺凌者的施暴渠道。不幸的是,网络欺凌(CB)是当今网络世界中最普遍的现象,并对公民的身心健康构成严重威胁。这促使我们迫切需要开发一个稳健的系统,以阻止来自在线论坛、博客和社交媒体平台上的欺凌内容,从而降低其对社会的影响。为此,已有多种机器学习(ML)算法被提出。然而,由于严重的类别不平衡和泛化问题,这些算法的性能并不稳定。近年来,像BERT和RoBERTa这样的大语言模型(LLMs)在多项自然语言处理(NLP)任务中取得了最先进(SOTA)的结果。遗憾的是,LLMs尚未被广泛用于网络欺凌检测。在本文中,我们探索了将这些模型用于网络欺凌(CB)检测的方法。我们从现有研究(Formspring和Twitter)中构建了一个新的数据集(D2)。我们在数据集D1和D2上的实验结果表明,RoBERTa的性能优于其他模型。