In observational studies of discrimination, the most common statistical approaches consider either the rate at which decisions are made (benchmark tests) or the success rate of those decisions (outcome tests). Both tests, however, have well-known statistical limitations, sometimes suggesting discrimination even when there is none. Despite the fallibility of the benchmark and outcome tests individually, here we prove a surprisingly strong statistical guarantee: under a common non-parametric assumption, at least one of the two tests must be correct; consequently, when both tests agree, they are guaranteed to yield correct conclusions. We present empirical evidence that the underlying assumption holds approximately in several important domains, including lending, education, and criminal justice -- and that our hybrid test is robust to the moderate violations of the assumption that we observe in practice. Applying this approach to 2.8 million police stops across California, we find evidence of widespread racial discrimination.
翻译:在歧视的观察性研究中,最常见的统计方法要么考虑决策作出的速率(基准检验),要么考虑这些决策的成功率(结果检验)。然而,这两种检验都存在众所周知的统计局限性,有时会在不存在歧视的情况下暗示其存在。尽管基准检验和结果检验各自存在缺陷,但本文证明了一个令人惊讶的强统计保证:在一个常见的非参数假设下,两种检验中至少有一种必须是正确的;因此,当两种检验结果一致时,它们保证能得出正确的结论。我们提供的经验证据表明,这一基本假设在若干重要领域(包括贷款、教育和刑事司法)中近似成立,并且我们的混合检验对于实践中观察到的中等程度的假设违反具有鲁棒性。将这种方法应用于加利福尼亚州280万次警察盘查数据,我们发现了普遍存在种族歧视的证据。