Toxicity classification in textual content remains a significant problem. Data with labels from a single annotator fall short of capturing the diversity of human perspectives. Therefore, there is a growing need to incorporate crowdsourced annotations for training an effective toxicity classifier. Additionally, the standard approach to training a classifier using empirical risk minimization (ERM) may fail to address the potential shifts between the training set and testing set due to exploiting spurious correlations. This work introduces a novel bi-level optimization framework that integrates crowdsourced annotations with the soft-labeling technique and optimizes the soft-label weights by Group Distributionally Robust Optimization (GroupDRO) to enhance the robustness against out-of-distribution (OOD) risk. We theoretically prove the convergence of our bi-level optimization algorithm. Experimental results demonstrate that our approach outperforms existing baseline methods in terms of both average and worst-group accuracy, confirming its effectiveness in leveraging crowdsourced annotations to achieve more effective and robust toxicity classification.
翻译:文本内容中的毒性分类仍然是一个重要问题。单一标注者提供标签的数据难以捕捉人类观点的多样性。因此,在训练有效的毒性分类器时,整合众包标注的需求日益增长。此外,使用经验风险最小化(ERM)训练分类器的标准方法可能因利用虚假相关性而无法处理训练集与测试集之间潜在的分布偏移。本研究提出了一种新颖的双层优化框架,该框架将众包标注与软标签技术相结合,并通过组分布鲁棒优化(GroupDRO)优化软标签权重,以增强对分布外(OOD)风险的鲁棒性。我们从理论上证明了该双层优化算法的收敛性。实验结果表明,我们的方法在平均准确率和最差组准确率方面均优于现有基线方法,证实了其利用众包标注实现更有效、更鲁棒的毒性分类的有效性。