In the pooled data problem, the goal is to identify the categories associated with a large collection of items via a sequence of pooled tests. Each pooled test reveals the number of items of each category within the pool. We study an approximate message passing (AMP) algorithm for estimating the categories and rigorously characterize its performance, in both the noiseless and noisy settings. For the noiseless setting, we show that the AMP algorithm is equivalent to one recently proposed by El Alaoui et al. Our results provide a rigorous version of their performance guarantees, previously obtained via non-rigorous techniques. For the case of pooled data with two categories, known as quantitative group testing (QGT), we use the AMP guarantees to compute precise limiting values of the false positive rate and the false negative rate. Though the pooled data problem and QGT are both instances of estimation in a linear model, existing AMP theory cannot be directly applied since the design matrices are binary valued. The key technical ingredient in our result is a rigorous analysis of AMP for generalized linear models defined via generalized white noise design matrices. This result, established using a recent universality result of Wang et al., is of independent interest. Our theoretical results are validated by numerical simulations. For comparison, we propose estimators based on convex relaxation and iterative thresholding, without providing theoretical guarantees. Our simulations indicate that AMP outperforms the convex programming estimator for a range of QGT scenarios, but the convex program performs better for pooled data with three categories.
翻译:在聚合数据问题中,目标是通过一系列聚合测试来识别大量物品的类别归属。每个聚合测试揭示该组中各物品类别的数量。我们研究了一种用于估计类别的近似消息传递算法,并严格刻画了其在无噪声和有噪声两种场景下的性能。对于无噪声场景,我们证明该AMP算法等价于El Alaoui等人近期提出的方法。我们的研究结果为该团队通过非严格技术获得的性能保证提供了严格版本。针对具有两个类别的聚合数据(即定量群检测,QGT),我们利用AMP保证计算出假阳性率和假阴性率的精确极限值。尽管聚合数据问题与QGT均为线性模型中的估计问题,但现有AMP理论因设计矩阵为二元取值而无法直接适用。本文结果的关键技术要素是对基于广义白噪声设计矩阵的广义线性模型进行严格的AMP分析。该结论借助Wang等人近期建立的普适性成果得出,具有独立学术价值。数值模拟验证了我们的理论结果。作为对比,我们提出了基于凸松弛和迭代阈值的方法(未提供理论保证)。模拟表明:AMP在多种QGT场景中优于凸规划估计器,但对三类聚合数据,凸规划方法表现更佳。