In the pooled data problem, the goal is to identify the categories associated with a large collection of items via a sequence of pooled tests. Each pooled test reveals the number of items of each category within the pool. We study an approximate message passing (AMP) algorithm for estimating the categories and rigorously characterize its performance, in both the noiseless and noisy settings. For the noiseless setting, we show that the AMP algorithm is equivalent to one recently proposed by El Alaoui et al. Our results provide a rigorous version of their performance guarantees, previously obtained via non-rigorous techniques. For the case of pooled data with two categories, known as quantitative group testing (QGT), we use the AMP guarantees to compute precise limiting values of the false positive rate and the false negative rate. Though the pooled data problem and QGT are both instances of estimation in a linear model, existing AMP theory cannot be directly applied since the design matrices are binary valued. The key technical ingredient in our analysis is a rigorous asymptotic characterization of AMP for generalized linear models defined via generalized white noise design matrices. This result, established using a recent universality result of Wang et al., is of independent interest. Our theoretical results are validated by numerical simulations. For comparison, we propose estimators based on convex relaxation and iterative thresholding, without providing theoretical guarantees. The simulations indicate that AMP outperforms the convex estimator for noiseless pooled data and QGT, but the convex estimator performs slightly better for noisy pooled data with three categories when the number of observations is small.
翻译:在池化数据问题中,目标是通过一系列池化测试来识别与大量物品关联的类别。每个池化测试会揭示池中各类别物品的数量。我们研究了一种用于估计类别的近似消息传递(AMP)算法,并在无噪和含噪两种设置下严格刻画了其性能。对于无噪设置,我们证明该AMP算法等价于El Alaoui等人最近提出的一种算法。我们的结果提供了其性能保证的严格版本,而此前这些保证是通过非严格技术获得的。对于具有两个类别的池化数据情况(即定量群组检测,QGT),我们利用AMP保证计算了假阳性率和假阴性率的精确极限值。尽管池化数据问题和QGT都是线性模型中估计问题的实例,但由于设计矩阵是二值的,现有的AMP理论无法直接应用。我们分析的关键技术要素是对广义白噪声设计矩阵定义的广义线性模型中AMP的严格渐近刻画。这一结果利用Wang等人最近的普适性结果建立,具有独立的意义。我们的理论结果通过数值模拟得到了验证。作为比较,我们提出了基于凸松弛和迭代阈值的估计器,但未提供理论保证。模拟结果表明,对于无噪池化数据和QGT,AMP优于凸估计器;然而,对于含噪且具有三个类别的池化数据,当观测数量较少时,凸估计器的表现略优。