Approximate Message Passing with Rigorous Guarantees for Pooled Data and Quantitative Group Testing

In the pooled data problem, the goal is to identify the categories associated with a large collection of items via a sequence of pooled tests. Each pooled test reveals the number of items of each category within the pool. We study an approximate message passing (AMP) algorithm for estimating the categories and rigorously characterize its performance, in both the noiseless and noisy settings. For the noiseless setting, we show that the AMP algorithm is equivalent to one recently proposed by El Alaoui et al. Our results provide a rigorous version of their performance guarantees, previously obtained via non-rigorous techniques. For the case of pooled data with two categories, known as quantitative group testing (QGT), we use the AMP guarantees to compute precise limiting values of the false positive rate and the false negative rate. Though the pooled data problem and QGT are both instances of estimation in a linear model, existing AMP theory cannot be directly applied since the design matrices are binary valued. The key technical ingredient in our analysis is a rigorous asymptotic characterization of AMP for generalized linear models defined via generalized white noise design matrices. This result, established using a recent universality result of Wang et al., is of independent interest. Our theoretical results are validated by numerical simulations. For comparison, we propose estimators based on convex relaxation and iterative thresholding, without providing theoretical guarantees. The simulations indicate that AMP outperforms the convex estimator for noiseless pooled data and QGT, but the convex estimator performs slightly better for noisy pooled data with three categories when the number of observations is small.

翻译：在汇集数据问题中，目标是通过一系列汇集测试来识别大量物品的类别。每次汇集测试揭示池中每个类别的物品数量。我们研究了一种用于估计类别的近似消息传递（AMP）算法，并在无噪声和有噪声设置下严格刻画其性能。对于无噪声情况，我们证明该AMP算法等价于El Alaoui等人最近提出的方法。我们的结果为其性能保证提供了严格版本，这些保证此前是通过非严格技术获得的。对于两类别的汇集数据情况（称为定量群检测，QGT），我们利用AMP保证计算出假阳性率和假阴性率的精确极限值。尽管汇集数据问题和QGT都是线性模型中的估计实例，但现有AMP理论无法直接应用，因为设计矩阵是二值的。我们分析中的关键技术要素是对通过广义白噪声设计矩阵定义的广义线性模型的AMP的严格渐近刻画。这一结果利用Wang等人最近的普适性结果建立，具有独立的研究价值。我们的理论结果通过数值仿真得到验证。作为比较，我们提出了基于凸松弛和迭代阈值的估计器，但未提供理论保证。仿真表明，对于无噪声汇集数据和QGT，AMP优于凸估计器；但对于三类别噪声汇集数据且观测数量较少时，凸估计器表现略优。