Social media have been deliberately used for malicious purposes, including political manipulation and disinformation. Most research focuses on high-resource languages. However, malicious actors share content across countries and languages, including low-resource ones. Here, we investigate whether and to what extent malicious actors can be detected in low-resource language settings. We discovered that a high number of accounts posting in Tagalog were suspended as part of Twitter's crackdown on interference operations after the 2016 US Presidential election. By combining text embedding and transfer learning, our framework can detect, with promising accuracy, malicious users posting in Tagalog without any prior knowledge or training on malicious content in that language. We first learn an embedding model for each language, namely a high-resource language (English) and a low-resource one (Tagalog), independently. Then, we learn a mapping between the two latent spaces to transfer the detection model. We demonstrate that the proposed approach significantly outperforms state-of-the-art models, including BERT, and yields marked advantages in settings with very limited training data -- the norm when dealing with detecting malicious activity in online platforms.
翻译:社交媒体已被蓄意用于恶意目的,包括政治操纵和虚假信息传播。现有研究主要关注高资源语言,但恶意行为者往往跨越国家和语言(包括低资源语言)共享内容。本研究探讨在低资源语言环境下能否检测到恶意行为者及其检测程度。我们发现,在2016年美国总统选举后推特针对干预行动的清查中,大量使用他加禄语发帖的账号被暂停。通过结合文本嵌入与迁移学习,我们的框架能够在无需事先了解或训练该语言恶意内容的情况下,以颇具竞争力的准确率检测使用他加禄语发帖的恶意用户。首先,我们分别针对高资源语言(英语)和低资源语言(他加禄语)独立训练嵌入模型;随后,通过学习两个潜在空间之间的映射实现检测模型的迁移。实验证明,所提方法显著优于包括BERT在内的现有最优模型,并在训练数据极度匮乏(这通常是检测在线平台恶意活动时的常态)的场景中展现出显著优势。