The expanding influence of social media platforms over the past decade has impacted the way people communicate. The level of obscurity provided by social media and easy accessibility of the internet has facilitated the spread of hate speech. The terms and expressions related to hate speech gets updated with changing times which poses an obstacle to policy-makers and researchers in case of hate speech identification. With growing number of individuals using their native languages to communicate with each other, hate speech in these low-resource languages are also growing. Although, there is awareness about the English-related approaches, much attention have not been provided to these low-resource languages due to lack of datasets and online available data. This article provides a detailed survey of hate speech detection in low-resource languages around the world with details of available datasets, features utilized and techniques used. This survey further discusses the prevailing surveys, overlapping concepts related to hate speech, research challenges and opportunities.
翻译:过去十年间,社交媒体平台日益扩大的影响力深刻改变了人们的交流方式。社交媒体提供的匿名性与互联网的易访问性共同助长了仇恨言论的传播。仇恨言论的相关术语和表达方式随时代变迁不断更新,这为政策制定者和研究者的识别工作带来障碍。随着越来越多使用者采用母语进行交流,这些低资源语言中的仇恨言论亦呈增长态势。尽管针对英语的相关研究方法已形成体系,但由于缺乏数据集和在线可用数据,低资源语言领域尚未获得足够关注。本文系统综述了全球范围内低资源语言的仇恨言论检测研究,详细阐述了现有数据集、采用的特征与技术方法,并进一步探讨了该领域的主流综述成果、与仇恨言论相关的交叉概念、研究挑战及未来机遇。