The development of fair and ethical AI systems requires careful consideration of bias mitigation, an area often overlooked or ignored. In this study, we introduce a novel and efficient approach for addressing biases called Targeted Data Augmentation (TDA), which leverages classical data augmentation techniques to tackle the pressing issue of bias in data and models. Unlike the laborious task of removing biases, our method proposes to insert biases instead, resulting in improved performance. To identify biases, we annotated two diverse datasets: a dataset of clinical skin lesions and a dataset of male and female faces. These bias annotations are published for the first time in this study, providing a valuable resource for future research. Through Counterfactual Bias Insertion, we discovered that biases associated with the frame, ruler, and glasses had a significant impact on models. By randomly introducing biases during training, we mitigated these biases and achieved a substantial decrease in bias measures, ranging from two-fold to more than 50-fold, while maintaining a negligible increase in the error rate.
翻译:公平且合乎道德的AI系统的开发需要审慎考虑偏差缓解,而这一领域常被忽视或忽略。本研究提出一种新颖且高效的偏差处理方法——目标数据增强(TDA),利用经典数据增强技术来解决数据和模型中日益突出的偏差问题。与移除偏差的繁重任务不同,我们的方法主张通过插入偏差来提升模型性能。为识别偏差,我们标注了两个多样化数据集:临床皮肤病变数据集和男女面部数据集。这些偏差标注在本研究中首次公开,为未来研究提供了宝贵资源。通过反事实偏差插入,我们发现与框架、标尺和眼镜相关的偏差对模型有显著影响。通过在训练过程中随机引入偏差,我们缓解了这些偏差,并在误差率几乎无增长的情况下,将偏差度量指标降低了2倍至50倍以上。