Diagnostic tests are almost never perfect. Studies quantifying their performance use knowledge of the true health status, measured with a reference diagnostic test. Researchers commonly assume that the reference test is perfect, which is often not the case in practice. When the assumption fails, conventional studies identify "apparent" performance or performance with respect to the reference, but not true performance. This paper provides the smallest possible bounds on the measures of true performance - sensitivity (true positive rate) and specificity (true negative rate), or equivalently false positive and negative rates, in standard settings. Implied bounds on policy-relevant parameters are derived: 1) Prevalence in screened populations; 2) Predictive values. Methods for inference based on moment inequalities are used to construct uniformly consistent confidence sets in level over a relevant family of data distributions. Emergency Use Authorization (EUA) and independent study data for the BinaxNOW COVID-19 antigen test demonstrate that the bounds can be very informative. Analysis reveals that the estimated false negative rates for symptomatic and asymptomatic patients are up to 3.17 and 4.59 times higher than the frequently cited "apparent" false negative rate. Further applicability of the results in the context of imperfect proxies such as survey responses and imputed protected classes is indicated.
翻译:诊断测试几乎从不完美。量化其性能的研究依赖于对真实健康状况的了解,这通常通过参考诊断测试来测量。研究者通常假设参考测试是完美的,但在实践中往往并非如此。当这一假设不成立时,传统研究识别的是"表观"性能或相对于参考测试的性能,而非真实性能。本文在标准设定下,为真实性能指标——灵敏度(真阳性率)和特异度(真阴性率),或等价的假阳性率与假阴性率——提供了可能的最小边界。进一步推导了对政策相关参数的隐含边界:1)筛查人群中的患病率;2)预测值。采用基于矩不等式推断的方法,构建了在相关数据分布族上水平一致且均匀的置信集。通过BinaxNOW COVID-19抗原检测的紧急使用授权(EUA)及独立研究数据表明,这些边界可提供丰富信息。分析显示,有症状与无症状患者的估计假阴性率分别可达常被引用的"表观"假阴性率的3.17倍和4.59倍。研究结果还可进一步应用于调查应答和推算受保护类别等不完善代理变量的情境。