Research on Violent Text Detection System Based on BERT-fasttext Model

In the digital age of today, the internet has become an indispensable platform for people's lives, work, and information exchange. However, the problem of violent text proliferation in the network environment has arisen, which has brought about many negative effects. In view of this situation, it is particularly important to build an effective system for cutting off violent text. The study of violent text cutting off based on the BERT-fasttext model has significant meaning. BERT is a pre-trained language model with strong natural language understanding ability, which can deeply mine and analyze text semantic information; Fasttext itself is an efficient text classification tool with low complexity and good effect, which can quickly provide basic judgments for text processing. By combining the two and applying them to the system for cutting off violent text, on the one hand, it can accurately identify violent text, and on the other hand, it can efficiently and reasonably cut off the content, preventing harmful information from spreading freely on the network. Compared with the single BERT model and fasttext, the accuracy was improved by 0.7% and 0.8%, respectively. Through this model, it is helpful to purify the network environment, maintain the health of network information, and create a positive, civilized, and harmonious online communication space for netizens, driving the development of social networking, information dissemination, and other aspects in a more benign direction.

翻译：在当今数字化时代，互联网已成为人们生活、工作和信息交流不可或缺的平台。然而，网络环境中暴力文本泛滥的问题随之产生，带来了诸多负面影响。针对这一现状，构建有效的暴力文本阻断系统显得尤为重要。基于BERT-fasttext模型的暴力文本阻断研究具有重要意义。BERT是一种具有强大自然语言理解能力的预训练语言模型，能够深入挖掘和分析文本语义信息；Fasttext本身是一种高效的文本分类工具，复杂度低且效果良好，能够为文本处理快速提供基础判断。将两者结合并应用于暴力文本阻断系统，一方面能够准确识别暴力文本，另一方面能够高效合理地进行内容阻断，防止有害信息在网络中肆意传播。与单一的BERT模型和fasttext相比，准确率分别提升了0.7%和0.8%。通过该模型，有助于净化网络环境，维护网络信息健康，为网民营造积极、文明、和谐的线上交流空间，推动社交网络、信息传播等方面朝着更加良性的方向发展。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日