The present-day Russia-Ukraine military conflict has exposed the pivotal role of social media in enabling the transparent and unbridled sharing of information directly from the frontlines. In conflict zones where freedom of expression is constrained and information warfare is pervasive, social media has emerged as an indispensable lifeline. Anonymous social media platforms, as publicly available sources for disseminating war-related information, have the potential to serve as effective instruments for monitoring and documenting Human Rights Violations (HRV). Our research focuses on the analysis of data from Telegram, the leading social media platform for reading independent news in post-Soviet regions. We gathered a dataset of posts sampled from 95 public Telegram channels that cover politics and war news, which we have utilized to identify potential occurrences of HRV. Employing a mBERT-based text classifier, we have conducted an analysis to detect any mentions of HRV in the Telegram data. Our final approach yielded an $F_2$ score of 0.71 for HRV detection, representing an improvement of 0.38 over the multilingual BERT base model. We release two datasets that contains Telegram posts: (1) large corpus with over 2.3 millions posts and (2) annotated at the sentence-level dataset to indicate HRVs. The Telegram posts are in the context of the Russia-Ukraine war. We posit that our findings hold significant implications for NGOs, governments, and researchers by providing a means to detect and document possible human rights violations.
翻译:当前俄乌军事冲突凸显了社交媒体在直接从前线实现透明且无拘束信息共享方面的关键作用。在言论自由受限、信息战盛行的冲突区域中,社交媒体已成为不可或缺的生命线。匿名社交媒体平台作为公开可获取的战争相关信息传播源,具备成为监测和记录人权侵犯(HRV)有效工具的潜力。本研究聚焦于分析Telegram——后苏联地区阅读独立新闻的主要社交媒体平台——的数据。我们收集了来自覆盖政治与战争新闻的95个公开Telegram频道的数据集,用于识别可能的人权侵犯事件。通过基于mBERT的文本分类器,我们对Telegram数据中涉及人权侵犯的提及进行了检测分析。最终方法在人权侵犯检测上获得了0.71的$F_2$分数,相较于多语言BERT基础模型提升了0.38。我们发布两个包含Telegram帖子的数据集:(1)包含超过230万条帖子的语料库,以及(2)用于标注人权侵犯的句子级标注数据集。这些Telegram帖子均涉及俄乌战争背景。我们认为,本研究成果将通过提供检测和记录潜在人权侵犯的手段,对非政府组织、政府及研究人员具有重要启示意义。