The negative effects of online bullying and harassment are increasing with Internet popularity, especially in social media. One solution is using natural language processing (NLP) and machine learning (ML) methods for the automatic detection of harmful remarks, but these methods are limited in low-resource languages like the Chittagonian dialect of Bangla.This study focuses on detecting vulgar remarks in social media using supervised ML and deep learning algorithms.Logistic Regression achieved promising accuracy (0.91) while simple RNN with Word2vec and fastTex had lower accuracy (0.84-0.90), highlighting the issue that NN algorithms require more data.
翻译:随着互联网普及,尤其是在社交媒体中,网络欺凌和骚扰的负面影响日益加剧。一种解决方案是使用自然语言处理(NLP)和机器学习(ML)方法自动检测有害言论,但这类方法在孟加拉语吉大港方言等低资源语言中受到限制。本研究聚焦于利用监督式机器学习和深度学习算法检测社交媒体中的粗俗言论。逻辑回归取得了令人瞩目的准确率(0.91),而基于Word2vec和fastText的简单循环神经网络准确率较低(0.84-0.90),这凸显了神经网络算法需要更多数据的问题。