Given Myanmars historical and socio-political context, hate speech spread on social media has escalated into offline unrest and violence. This paper presents findings from our remote study on the automatic detection of hate speech online in Myanmar. We argue that effectively addressing this problem will require community-based approaches that combine the knowledge of context experts with machine learning tools that can analyze the vast amount of data produced. To this end, we develop a systematic process to facilitate this collaboration covering key aspects of data collection, annotation, and model validation strategies. We highlight challenges in this area stemming from small and imbalanced datasets, the need to balance non-glamorous data work and stakeholder priorities, and closed data-sharing practices. Stemming from these findings, we discuss avenues for further work in developing and deploying hate speech detection systems for low-resource languages.
翻译:鉴于缅甸的历史与社会政治背景,社交媒体上的仇恨言论传播已升级为线下动荡与暴力。本文展示了我们在缅甸开展的在线仇恨言论自动检测远程研究成果。我们认为,有效解决这一问题需采用基于社区的方法,将情境专家的知识与能够分析海量数据的机器学习工具相结合。为此,我们开发了一套系统化流程以促进这一合作,涵盖数据收集、标注和模型验证策略等关键环节。我们重点指出了该领域面临的挑战:小规模且不平衡的数据集、需平衡非技术性数据工作与利益相关者的优先级、以及封闭的数据共享实践。基于这些发现,我们进一步讨论了为低资源语言开发部署仇恨言论检测系统的未来研究方向。