The proliferation of harmful content on online platforms is a major societal problem, which comes in many different forms including hate speech, offensive language, bullying and harassment, misinformation, spam, violence, graphic content, sexual abuse, self harm, and many other. Online platforms seek to moderate such content to limit societal harm, to comply with legislation, and to create a more inclusive environment for their users. Researchers have developed different methods for automatically detecting harmful content, often focusing on specific sub-problems or on narrow communities, as what is considered harmful often depends on the platform and on the context. We argue that there is currently a dichotomy between what types of harmful content online platforms seek to curb, and what research efforts there are to automatically detect such content. We thus survey existing methods as well as content moderation policies by online platforms in this light and we suggest directions for future work.
翻译:网络平台上有害内容的扩散是一个重大的社会问题,其表现形式多种多样,包括仇恨言论、攻击性语言、霸凌与骚扰、虚假信息、垃圾信息、暴力、露骨内容、性虐待、自残行为等。为限制社会危害、遵守法律法规并营造更具包容性的用户环境,网络平台致力于对此类内容进行审核。研究者已开发出多种自动检测有害内容的方法,常聚焦于特定子问题或特定社群——因为何为“有害”往往取决于平台与上下文。我们认为,当前存在一种二元对立:网络平台试图遏制的有害内容类型,与研究领域对自动检测此类内容的探索之间存在脱节。基于此,本文审视现有方法及网络平台的内容审核政策,并指出未来研究方向。