Social media platforms struggle to protect users from harmful content through content moderation. These platforms have recently leveraged machine learning models to cope with the vast amount of user-generated content daily. Since moderation policies vary depending on countries and types of products, it is common to train and deploy the models per policy. However, this approach is highly inefficient, especially when the policies change, requiring dataset re-labeling and model re-training on the shifted data distribution. To alleviate this cost inefficiency, social media platforms often employ third-party content moderation services that provide prediction scores of multiple subtasks, such as predicting the existence of underage personnel, rude gestures, or weapons, instead of directly providing final moderation decisions. However, making a reliable automated moderation decision from the prediction scores of the multiple subtasks for a specific target policy has not been widely explored yet. In this study, we formulate real-world scenarios of content moderation and introduce a simple yet effective threshold optimization method that searches the optimal thresholds of the multiple subtasks to make a reliable moderation decision in a cost-effective way. Extensive experiments demonstrate that our approach shows better performance in content moderation compared to existing threshold optimization methods and heuristics.
翻译:社交媒体平台致力于通过内容审核保护用户免受有害内容侵害。为应对每日海量的用户生成内容,这些平台近年来广泛采用机器学习模型。由于审核政策因国家和产品类型而异,通常需为每种政策单独训练和部署模型。然而,这种做法效率低下,尤其是当政策变更时,需在偏移的数据分布上重新标注数据集并重新训练模型。为缓解成本问题,社交媒体平台常采用第三方内容审核服务,这些服务提供多个子任务的预测分数(如预测是否存在未成年人、粗鲁手势或武器),而非直接输出最终审核决策。然而,如何基于多个子任务的预测分数,针对特定目标政策做出可靠的自动化审核决策,目前尚缺乏广泛研究。本研究通过构建真实场景下的内容审核问题,提出一种简单而有效的阈值优化方法:通过搜索多个子任务的最优阈值,以低成本高效益的方式做出可靠审核决策。大量实验表明,与现有阈值优化方法和启发式策略相比,本方法在内容审核中表现更优。