This paper presents our work for the Violence Inciting Text Detection shared task in the First Workshop on Bangla Language Processing. Social media has accelerated the propagation of hate and violence-inciting speech in society. It is essential to develop efficient mechanisms to detect and curb the propagation of such texts. The problem of detecting violence-inciting texts is further exacerbated in low-resource settings due to sparse research and less data. The data provided in the shared task consists of texts in the Bangla language, where each example is classified into one of the three categories defined based on the types of violence-inciting texts. We try and evaluate several BERT-based models, and then use an ensemble of the models as our final submission. Our submission is ranked 10th in the final leaderboard of the shared task with a macro F1 score of 0.737.
翻译:本文介绍了我们在首届孟加拉语处理研讨会暴力煽动文本检测共享任务中的工作。社交媒体加速了社会中仇恨与暴力煽动言论的传播,亟需开发高效机制来检测并遏制此类文本的扩散。在低资源场景下,由于研究匮乏和数据稀缺,暴力煽动文本的检测问题更为严峻。本次共享任务提供的数据为孟加拉语文本,每条样本根据暴力煽动文本的类型被划分为三个类别之一。我们尝试评估了若干基于BERT的模型,并最终采用集成模型作为提交方案。我们的提交结果在共享任务的最终排行榜上以0.737的宏F1分数位列第十。