Diagnostic tests are almost never perfect. Studies quantifying their performance use knowledge of the true health status, measured with a reference diagnostic test. Researchers commonly assume that the reference test is perfect, which is not the case in practice. When the assumption fails, conventional studies identify "apparent" performance or performance with respect to the reference, but not true performance. This paper provides the smallest possible bounds on the measures of true performance - sensitivity (true positive rate) and specificity (true negative rate), or equivalently false positive and negative rates, in standard settings. Implied bounds on policy-relevant parameters are derived: 1) Prevalence in screened populations; 2) Predictive values. Methods for inference based on moment inequalities are used to construct uniformly consistent confidence sets in level over a relevant family of data distributions. Emergency Use Authorization (EUA) and independent study data for the BinaxNOW COVID-19 antigen test demonstrate that the bounds can be very informative. Analysis reveals that the estimated false negative rates for symptomatic and asymptomatic patients are up to 3.17 and 4.59 times higher than the frequently cited "apparent" false negative rate.
翻译:诊断测试几乎从不完美。量化其性能的研究依赖于通过参考诊断测试测量的真实健康状况。研究者通常假设参考测试是完美的,但实践中并非如此。当该假设不成立时,传统研究识别的是“表观”性能或相对于参考的性能,而非真实性能。本文为标准设定下真实性能指标——敏感度(真阳性率)和特异度(真阴性率),或等价地假阳性率和假阴性率——提供了可能的最小界限。由此推导出政策相关参数的隐含界限:1) 筛查人群中的患病率;2) 预测值。基于矩不等式的方法被用于构建在相关数据分布族上水平一致的一致性置信集。BinaxNOW COVID-19抗原检测的紧急使用授权(EUA)和独立研究数据表明,这些界限可能具有很高的信息量。分析显示,有症状和无症状患者的估计假阴性率分别是常被引用的“表观”假阴性率的3.17倍和4.59倍。