Hate speech moderation remains a challenging task for social media platforms. Human-AI collaborative systems offer the potential to combine the strengths of humans' reliability and the scalability of machine learning to tackle this issue effectively. While methods for task handover in human-AI collaboration exist that consider the costs of incorrect predictions, insufficient attention has been paid to accurately estimating these costs. In this work, we propose a value-sensitive rejection mechanism that automatically rejects machine decisions for human moderation based on users' value perceptions regarding machine decisions. We conduct a crowdsourced survey study with 160 participants to evaluate their perception of correct and incorrect machine decisions in the domain of hate speech detection, as well as occurrences where the system rejects making a prediction. Here, we introduce Magnitude Estimation, an unbounded scale, as the preferred method for measuring user (dis)agreement with machine decisions. Our results show that Magnitude Estimation can provide a reliable measurement of participants' perception of machine decisions. By integrating user-perceived value into human-AI collaboration, we further show that it can guide us in 1) determining when to accept or reject machine decisions to obtain the optimal total value a model can deliver and 2) selecting better classification models as compared to the more widely used target of model accuracy.
翻译:仇恨言论审核对社交媒体平台而言仍是一项挑战性的任务。人机协作系统有望结合人类的可靠性与机器学习可扩展性的优势,有效解决这一问题。尽管在人机协作中存在考虑错误预测代价的任务交接方法,但对于如何精确估算这些代价的关注仍显不足。本文提出一种价值敏感型拒绝机制,该机制基于用户对机器决策的价值感知,自动拒绝机器决策并将其转交人工审核。我们开展了一项包含160名参与者的众包调查研究,评估他们对仇恨言论检测中机器正确/错误决策的感知,以及系统拒绝做出预测的情况。在此,我们引入无界量表——“幅度估计法”,作为衡量用户对机器决策(不)认同度的首选方法。结果表明,幅度估计法能够可靠地测量参与者对机器决策的感知。通过将用户感知价值融入人机协作,我们进一步证明该方法能够指导:1)确定何时接受或拒绝机器决策以获取模型可交付的最优总价值;2)相较于更广泛使用的模型准确率目标,选择更优的分类模型。