The concept of fairness is gaining popularity in academia and industry. Social media is especially vulnerable to media biases and toxic language and comments. We propose a fair ML pipeline that takes a text as input and determines whether it contains biases and toxic content. Then, based on pre-trained word embeddings, it suggests a set of new words by substituting the bi-ased words, the idea is to lessen the effects of those biases by replacing them with alternative words. We compare our approach to existing fairness models to determine its effectiveness. The results show that our proposed pipeline can de-tect, identify, and mitigate biases in social media data
翻译:公平性概念在学术界和工业界日益受到关注。社交媒体尤其容易受到媒体偏见以及有毒语言和评论的影响。我们提出一个公平的机器学习流水线,它以文本为输入,判断其中是否包含偏见和有毒内容。然后,基于预训练的词嵌入,它通过替换偏见词来建议一组新词,其思路是通过使用替代词来减轻这些偏见的影响。我们将我们的方法与现有的公平性模型进行比较,以确定其有效性。结果表明,我们提出的流水线能够检测、识别并减轻社交媒体数据中的偏见。