Due to the widespread use of data-powered systems in our everyday lives, concepts like bias and fairness gained significant attention among researchers and practitioners, in both industry and academia. Such issues typically emerge from the data, which comes with varying levels of quality, used to train supervised machine learning systems. With the commercialization and deployment of such systems that are sometimes delegated to make life-changing decisions, significant efforts are being made towards the identification and removal of possible sources of data bias that may resurface to the final end user or in the decisions being made. In this paper, we present research results that show how bias in data affects end users, where bias is originated, and provide a viewpoint about what we should do about it. We argue that data bias is not something that should necessarily be removed in all cases, and that research attention should instead shift from bias removal towards the identification, measurement, indexing, surfacing, and adapting for bias, which we name bias management.
翻译:数据驱动系统在日常生活中广泛应用,使得偏差和公平性等概念受到工业界和学术界研究人员及从业者的高度关注。这些问题通常源自用于训练监督式机器学习系统的数据,而这些数据本身质量参差不齐。随着这类有时被赋予做出改变人生决策权力的系统投入商业化部署,人们正投入大量精力识别和消除可能最终波及终端用户或影响决策结果的数据偏差来源。本文展示了关于数据偏差如何影响终端用户的研究成果,分析了偏差的起源,并就应对措施提出观点。我们认为数据偏差并非在所有情况下都需强制消除,研究重点应从消除偏差转向偏差识别、测量、索引、呈现与适配——我们将其统称为偏差管理。