We consider high-throughput experiments that take measurements regarding many parameters. Due to resource limitations, ``breadth-first'' high-throughput experiments take only a few independent samples of each parameter, and so it is challenging to assess estimator error. We propose a new model-free method for bounding type S errors in this context, based on a quantity we call the Cross-replicate Sign Error Rate (CSER). The CSER is the expected sign agreement between a fixed set of estimates and estimates based on an independent experimental replicate. To show the CSER can be estimated with enough accuracy to be useful in practice, we develop new improvements to Hoeffding's bounds for sums of bounded random variables, obtaining the tightest bounds that can be obtained from the Chernoff inequality. We apply this method to analyzing measurements from cell-perturbation experiments. Our method reveals that existing error control practices fail to control error at their nominal level in some cases and are needlessly conservative in others. The CSER is easy to estimate, enabling practitioners to detect problems in their experimental designs and identify subsets of parameters with a low proportion of type S errors.
翻译:我们考虑针对多个参数进行测量的大规模实验。由于资源限制,“广度优先”的大规模实验仅对每个参数采集少量独立样本,因此评估估计误差具有挑战性。我们提出一种新的无模型方法,基于一个称为“交叉复制符号错误率”(Cross-replicate Sign Error Rate, CSER)的量,在此背景下界定S型错误的上界。CSER是指固定估计值与基于独立实验复制品所得估计值之间符号一致性的期望值。为证明CSER能以足够精度估计并实际应用,我们改进了Hoeffding界对有界随机变量和的应用,获得了基于Chernoff不等式可得到的最紧凑界。我们将该方法应用于分析细胞扰动实验的测量数据。结果表明,现有误差控制实践在某些情况下未能按标称水平控制误差,而在其他情况下则过于保守。CSER易于估计,使实践者能够检测实验设计中的问题,并识别出S型错误比例较低的参数子集。