The rapid growth in user generated content on social media has resulted in a significant rise in demand for automated content moderation. Various methods and frameworks have been proposed for the tasks of hate speech detection and toxic comment classification. In this work, we combine common datasets to extend these tasks to brand safety. Brand safety aims to protect commercial branding by identifying contexts where advertisements should not appear and covers not only toxicity, but also other potentially harmful content. As these datasets contain different label sets, we approach the overall problem as a binary classification task. We demonstrate the need for building brand safety specific datasets via the application of common toxicity detection datasets to a subset of brand safety and empirically analyze the effects of weighted sampling strategies in text classification.
翻译:社交媒体上用户生成内容的快速增长导致了对自动化内容审核需求的显著上升。针对仇恨言论检测和有毒评论分类任务,研究者已提出了多种方法和框架。在本工作中,我们整合常用数据集,将这些任务扩展至品牌安全领域。品牌安全旨在通过识别广告不应出现的语境来保护商业品牌,其涵盖范围不仅包括毒性内容,还涉及其他潜在有害内容。由于这些数据集包含不同的标签集,我们将整体问题视为二分类任务。通过将常见毒性检测数据集应用于品牌安全子集,我们证明了构建品牌安全专用数据集的必要性,并实证分析了加权采样策略在文本分类中的效果。