The presence of specific linguistic signals particular to a certain sub-group of people can be picked up by language models during training. This may lead to discrimination if the model has learnt to pick up on a certain group's language. If the model begins to associate specific language with a distinct group, any decisions made based upon this language would hold a strong correlation to a decision based on their protected characteristic. We explore a possible technique for bias mitigation in the form of simplification of text. The driving force of this idea is that simplifying text should standardise language to one way of speaking while keeping the same meaning. The experiment shows promising results as the classifier accuracy for predicting the sensitive attribute drops by up to 17% for the simplified data.
翻译:特定子群体中独有的语言信号会在语言模型训练过程中被学习。若模型习得识别特定群体的语言特征,可能导致歧视性结果——当模型开始将某种语言模式与特定群体关联时,基于该语言模式所做的任何决策都将与其受保护特征高度相关。我们探索了一种通过文本简化实现偏见消减的潜在技术。该方案的核心思想在于:简化文本应能在保持语义不变的前提下,将语言表达标准化为统一范式。实验结果显示,经简化处理后,基于敏感属性预测的分类器准确率最高可降低17%。