Detection of Hate Speech using BERT and Hate Speech Word Embedding with Deep Model

The enormous amount of data being generated on the web and social media has increased the demand for detecting online hate speech. Detecting hate speech will reduce their negative impact and influence on others. A lot of effort in the Natural Language Processing (NLP) domain aimed to detect hate speech in general or detect specific hate speech such as religion, race, gender, or sexual orientation. Hate communities tend to use abbreviations, intentional spelling mistakes, and coded words in their communication to evade detection, adding more challenges to hate speech detection tasks. Thus, word representation will play an increasingly pivotal role in detecting hate speech. This paper investigates the feasibility of leveraging domain-specific word embedding in Bidirectional LSTM based deep model to automatically detect/classify hate speech. Furthermore, we investigate the use of the transfer learning language model (BERT) on hate speech problem as a binary classification task. The experiments showed that domainspecific word embedding with the Bidirectional LSTM based deep model achieved a 93% f1-score while BERT achieved up to 96% f1-score on a combined balanced dataset from available hate speech datasets.

翻译：在网络和社交媒体上产生的大量数据增加了对发现网上仇恨言论的需求。检测仇恨言论将减少其负面影响和对他人的影响。在自然语言处理(NLP)领域做了大量努力,目的是发现一般的仇恨言论或发现具体的仇恨言论,如宗教、种族、性别或性取向。仇恨社区往往在通信中使用缩略语、故意拼写错误和编码单词以逃避检测,从而增加了对仇恨言论检测任务的挑战。因此,字面表达将在发现仇恨言论方面发挥越来越关键的作用。本文调查利用基于双向LSTM的深度模型嵌入域名词的可行性,以自动检测/分类仇恨言论。此外,我们调查将转移语言模型(BERT)用于仇恨言论问题,作为二元分类任务。实验显示,与基于双向LSTM的深层模型嵌入的域名词达到了93%的f1分数,而BERT在从现有的仇恨言论数据集综合平衡数据集上达到了96%的f1分数。