Bias in applications of machine learning (ML) to healthcare is usually attributed to unrepresentative or incomplete data, or to underlying health disparities. This article identifies a more pervasive source of bias that affects the clinical utility of ML-enabled prediction tools: target specification bias. Target specification bias arises when the operationalization of the target variable does not match its definition by decision makers. The mismatch is often subtle, and stems from the fact that decision makers are typically interested in predicting the outcomes of counterfactual, rather than actual, healthcare scenarios. Target specification bias persists independently of data limitations and health disparities. When left uncorrected, it gives rise to an overestimation of predictive accuracy, to inefficient utilization of medical resources, and to suboptimal decisions that can harm patients. Recent work in metrology - the science of measurement - suggests ways of counteracting target specification bias and avoiding its harmful consequences.
翻译:机器学习在医疗领域应用中的偏差通常被归因于数据不具代表性或不完整,或潜在的健康差异。本文识别出一种影响基于机器学习预测工具临床效用的更普遍偏差来源:目标定义偏差。当目标变量的操作化定义与其决策者定义不一致时,便会产生目标定义偏差。这种不匹配往往很微妙,源于决策者通常关注的是预测反事实(而非实际)医疗场景的结果。目标定义偏差独立于数据局限性和健康差异而存在。若不加以纠正,会导致预测准确性被高估、医疗资源利用效率低下,以及可能损害患者的次优决策。测量科学领域的最新研究为抵消目标定义偏差并避免其有害后果提供了可行方法。