Data equity is an emerging framework for responsible data science. However, its core concepts, including fairness, representativeness, and information bias, remain largely abstract and general, lacking the mathematical specificity needed for practical implementation. In this paper, we demonstrate how statisticians can operationalize data equity by translating its tenets into precise, testable formulations tailored to a given problem. Using the well-documented case of differential measurement error across racial groups in pulse oximetry, we first adopt an oracle approach, tracing how a single upstream violation of information bias compounds through the analytic pipeline into treatment disparities, fairness violations, and adverse health outcomes. We then demonstrate the inverse: starting from an observed outcome disparity, the data equity framework provides a principled structure for systematically identifying its statistical sources. Our exposition reveals that data equity, prediction equity, and decision equity are distinct requirements with distinct evaluation and policy needs--a nuance that highlights both the unique role of statisticians in the era of artificial intelligence as well as the necessity of interdisciplinary collaboration.
翻译:数据公平性是负责任数据科学的新兴框架。然而,其核心概念,包括公平性、代表性和信息偏差,在很大程度上仍停留在抽象和一般层面,缺乏实践所需的具体数学形式。本文展示了统计学家如何通过将数据公平性原则转化为针对特定问题的、精确且可检验的数学公式来实现其可操作性。以脉搏血氧测定中跨种族群体差异化测量误差这一充分记录的案例为基础,我们首先采用先验方法,追溯上游单一的信息偏差违反如何通过分析流程复合为治疗差异、公平性违反及不良健康结果。随后,我们演示逆向过程:从观察到的结果差异出发,数据公平性框架为系统性识别其统计来源提供了原则性结构。我们的论述表明,数据公平性、预测公平性与决策公平性是具有不同评估与政策需求的不同要求——这一细微差别既凸显了统计学家在人工智能时代的独特作用,也强调了跨学科合作的必要性。