The Fediverse, a group of interconnected servers providing a variety of interoperable services (e.g. micro-blogging in Mastodon) has gained rapid popularity. This sudden growth, partly driven by Elon Musk's acquisition of Twitter, has created challenges for administrators though. This paper focuses on one particular challenge: content moderation, e.g. the need to remove spam or hate speech. While centralized platforms like Facebook and Twitter rely on automated tools for moderation, their dependence on massive labeled datasets and specialized infrastructure renders them impractical for decentralized, low-resource settings like the Fediverse. In this work, we design and evaluate FedMod, a collaborative content moderation system based on federated learning. Our system enables servers to exchange parameters of partially trained local content moderation models with similar servers, creating a federated model shared among collaborating servers. FedMod demonstrates robust performance on three different content moderation tasks: harmful content detection, bot content detection, and content warning assignment, achieving average per-server macro-F1 scores of 0.71, 0.73, and 0.58, respectively.
翻译:联邦宇宙(Fediverse)作为一组相互连接的服务集群,提供多种可互操作的服务(例如Mastodon中的微博客),已迅速获得广泛关注。然而,这种由埃隆·马斯克收购Twitter等事件推动的爆发式增长,也给管理员带来了诸多挑战。本文聚焦于其中一项特定挑战:内容审核,例如清除垃圾信息或仇恨言论的需求。尽管Facebook和Twitter等中心化平台依赖自动化工具进行内容审核,但这些工具对大规模标注数据集和专用基础设施的依赖,使其难以适用于联邦宇宙这类去中心化、资源受限的环境。在本研究中,我们设计并评估了FedMod——一个基于联邦学习的协同内容审核系统。该系统使服务器能够与相似服务器交换部分训练的本地内容审核模型参数,从而在协作服务器间构建共享的联邦模型。FedMod在有害内容检测、机器人内容检测和内容警告分配这三项不同的内容审核任务中均表现出稳健性能,其各服务器平均宏F1分数分别达到0.71、0.73和0.58。