Learning to Defer in Content Moderation: The Human-AI Interplay

Successful content moderation in online platforms relies on a human-AI collaboration approach. A typical heuristic estimates the expected harmfulness of a post and uses fixed thresholds to decide whether to remove it and whether to send it for human review. This disregards the prediction uncertainty, the time-varying element of human review capacity and post arrivals, and the selective sampling in the dataset (humans only review posts filtered by the admission algorithm). In this paper, we introduce a model to capture the human-AI interplay in content moderation. The algorithm observes contextual information for incoming posts, makes classification and admission decisions, and schedules posts for human review. Only admitted posts receive human reviews on their harmfulness. These reviews help educate the machine-learning algorithms but are delayed due to congestion in the human review system. The classical learning-theoretic way to capture this human-AI interplay is via the framework of learning to defer, where the algorithm has the option to defer a classification task to humans for a fixed cost and immediately receive feedback. Our model contributes to this literature by introducing congestion in the human review system. Moreover, unlike work on online learning with delayed feedback where the delay in the feedback is exogenous to the algorithm's decisions, the delay in our model is endogenous to both the admission and the scheduling decisions. We propose a near-optimal learning algorithm that carefully balances the classification loss from a selectively sampled dataset, the idiosyncratic loss of non-reviewed posts, and the delay loss of having congestion in the human review system. To the best of our knowledge, this is the first result for online learning in contextual queueing systems and hence our analytical framework may be of independent interest.

翻译：在线平台的内容审核成功依赖于人机协同策略。典型启发式方法通过估计帖子的预期危害性，并采用固定阈值决定是否删除或提交人工审核。这种方法忽略了预测不确定性、人工审核容量与帖子到达的时间动态性，以及数据集中的选择性采样（人工仅审核通过准入算法过滤的帖子）。本文提出一个模型来刻画内容审核中的人机协同过程。该算法观察新帖子的上下文信息，做出分类和准入决策，并将帖子排入人工审核队列。仅被准入的帖子会获得危害性的人工审核反馈，这些反馈虽有助于训练机器学习算法，但因人工审核系统的拥塞而产生延迟。经典学习理论上刻画这种人机协同的方式是通过"学习延迟决策"框架，即算法可将分类任务以固定成本延迟至人工处理并立即获得反馈。本文通过引入人工审核系统的拥塞机制推进了该领域研究。不同于反馈延迟外生于算法决策的在线学习文献，本文模型中的延迟内生于准入决策与调度决策。我们提出一种近优学习算法，该算法能够精巧平衡选择性采样数据集的分类损失、未审核帖子的异质性损失以及人工审核系统拥塞造成的延迟损失。据我们所知，这是上下文排队系统中在线学习的首次理论成果，因此本文的分析框架可能具有独立研究价值。