Biased datasets are ubiquitous and present a challenge for machine learning. For a number of categories on a dataset that are equally important but some are sparse and others are common, the learning algorithms will favor the ones with more presence. The problem of biased datasets is especially sensitive when dealing with minority people groups. How can we, from biased data, generate algorithms that treat every person equally? This work explores one way to mitigate bias using a debiasing variational autoencoder with experiments on facial expression recognition.
翻译:偏差数据集普遍存在,对机器学习构成挑战。对于数据集中某些类别同等重要但部分类别稀疏、其他类别常见的情况,学习算法会偏向出现频率更高的类别。在处理少数人群群体时,偏差数据集的问题尤为敏感。我们应如何从有偏差的数据中,生成能平等对待每个人的算法?本研究探索了一种利用去偏差变分自编码器缓解偏差的方法,并在面部表情识别任务上进行了实验验证。