With the advent of high-throughput screenings, it has become increasingly common for studies to devote limited resources to estimating many parameters imprecisely rather than to estimating a few parameters well. In these studies, only two or three independent replicates measure each parameter, and therefore it is challenging to assess the variance of these measurements. One solution is to pool variance estimates across different parameters using a parametric model of estimator error. However, such models are difficult to specify correctly, especially in the presence of ``batch effects.'' In this paper, we propose new model-free methods for assessing and controlling estimator error. Our focus is on type S error, which is of particular importance in many settings. To produce tight confidence intervals without making unrealistic assumptions, we improve on Hoeffding's bounds for sums of bounded random variables and obtain the tightest possible Chernoff-Cram\'er bound. Our methods compare favorably with existing practice for high-throughput screenings, such as methods based on the Irreproducible Discovery Rate (IDR) and the Benjamini-Hochberg procedure. Existing practices fail to control error at the nominal level in some cases and are needlessly conservative in others.
翻译:随着高通量筛选技术的兴起,研究趋势逐渐转向以有限资源对大量参数进行粗略估计,而非对少数参数进行精确测量。此类研究通常仅对每个参数进行两到三次独立重复测量,导致测量结果的方差评估面临挑战。现有解决方案多采用参数化误差估计模型对不同参数的方差估计进行合并,但此类模型在存在"批次效应"时难以准确设定。本文提出新型无模型方法用于评估和控制估计误差,特别关注在许多实验场景中至关重要的S型误差。为在不依赖理想化假设的前提下获得严格的置信区间,我们改进了霍夫丁有界随机变量和边界,并推导出最紧致的切尔诺夫-克拉默边界。相较于基于不可重复发现率(IDR)和本杰明-霍赫伯格程序等高通量筛选常规方法,本方法在部分案例中能更有效地将误差控制在标称水平,同时避免其他案例中不必要的保守估计。