We study Dorfman's classical group testing protocol in a novel setting where individual specimen statuses are modeled as exchangeable random variables. We are motivated by infectious disease screening. In that case, specimens which arrive together for testing often originate from the same community and so their statuses may exhibit positive correlation. Dorfman's protocol screens a population of n specimens for a binary trait by partitioning it into nonoverlapping groups, testing these, and only individually retesting the specimens of each positive group. The partition is chosen to minimize the expected number of tests under a probabilistic model of specimen statuses. We relax the typical assumption that these are independent and indentically distributed and instead model them as exchangeable random variables. In this case, their joint distribution is symmetric in the sense that it is invariant under permutations. We give a characterization of such distributions in terms of a function q where q(h) is the marginal probability that any group of size h tests negative. We use this interpretable representation to show that the set partitioning problem arising in Dorfman's protocol can be reduced to an integer partitioning problem and efficiently solved. We apply these tools to an empirical dataset from the COVID-19 pandemic. The methodology helps explain the unexpectedly high empirical efficiency reported by the original investigators.
翻译:我们研究Dorfman经典群组检测协议在新场景中的应用,该场景中个体样本状态被建模为可交换随机变量。研究受传染病筛查需求驱动。在此类应用中,同时送检的样本常源于同一社区,其状态可能呈现正相关性。Dorfman协议通过将包含n个样本的群体划分为非重叠组进行检测,仅对阳性组中的每个样本进行二次单独检测,以实现二元性状的筛查。该分组方案旨在最小化基于样本状态概率模型下的预期检测次数。我们放宽了样本独立同分布的传统假设,转而将其建模为可交换随机变量。在此条件下,样本的联合分布具有置换不变性所体现的对称特征。我们通过函数q刻画此类分布,其中q(h)表示任意规模为h的组别检测结果为阴性的边际概率。基于这一可解释的表示形式,我们证明Dorfman协议中的集合划分问题可简化为整数划分问题,并可通过高效算法求解。我们将该方法应用于COVID-19疫情实证数据集。该模型有助于解释原始研究者报告的高于预期的实证效率。