Communal violence in online forums has become extremely prevalent in South Asia, where many communities of different cultures coexist and share resources. These societies exhibit a phenomenon characterized by strong bonds within their own groups and animosity towards others, leading to conflicts that frequently escalate into violent confrontations. To address this issue, we have developed the first comprehensive framework for the automatic detection of communal violence markers in online Bangla content accompanying the largest collection (13K raw sentences) of social media interactions that fall under the definition of four major violence class and their 16 coarse expressions. Our workflow introduces a 7-step expert annotation process incorporating insights from social scientists, linguists, and psychologists. By presenting data statistics and benchmarking performance using this dataset, we have determined that, aside from the category of Non-communal violence, Religio-communal violence is particularly pervasive in Bangla text. Moreover, we have substantiated the effectiveness of fine-tuning language models in identifying violent comments by conducting preliminary benchmarking on the state-of-the-art Bangla deep learning model.
翻译:在线论坛中的社群暴力现象在南亚地区极为普遍,该地区不同文化社群共存并共享资源。这些社会呈现一种特征:群体内部联系紧密,对外则充满敌意,导致冲突频繁升级为暴力对抗。为解决此问题,我们开发了首个全面框架,用于自动检测在线孟加拉语内容中的社群暴力标记,并配套构建了最大规模的社交媒体互动数据集(13,000条原始语句),涵盖四大暴力类别及其16种粗粒度表达。我们的工作流程引入七步专家标注流程,融合社会科学家、语言学家和心理学家的见解。通过呈现数据统计及基于该数据集的基准性能评估,我们确定除“非社群暴力”类别外,“宗教社群暴力”在孟加拉语文本中尤为普遍。此外,通过对当前最先进的孟加拉语深度学习模型进行初步基准测试,我们验证了微调语言模型在识别暴力评论方面的有效性。