We study Dorfman's classical group testing protocol in a novel setting where individual specimen statuses are modeled as exchangeable random variables. We are motivated by infectious disease screening. In that case, specimens which arrive together for testing often originate from the same community and so their statuses may exhibit positive correlation. Dorfman's protocol screens a population of n specimens for a binary trait by partitioning it into non-overlapping groups, testing these, and only individually retesting the specimens of each positive group. The partition is chosen to minimize the expected number of tests under a probabilistic model of specimen statuses. We relax the typical assumption that these are independent and identically distributed and instead model them as exchangeable random variables. In this case, their joint distribution is symmetric in the sense that it is invariant under permutations. We give a characterization of such distributions in terms of a function q where q(h) is the marginal probability that any group of size h tests negative. We use this interpretable representation to show that the set partitioning problem arising in Dorfman's protocol can be reduced to an integer partitioning problem and efficiently solved. We apply these tools to an empirical dataset from the COVID-19 pandemic. The methodology helps explain the unexpectedly high empirical efficiency reported by the original investigators.
翻译:我们研究多夫曼经典群体检测协议在新场景下的应用,其中个体标本状态被建模为可交换随机变量。我们的研究动机源于传染病筛查。在此类场景中,同时送达检测的标本常源自同一社区,其状态可能存在正相关性。多夫曼协议通过将n个标本的群体划分为互不重叠的子组进行二元性状筛查:先对各子组进行检测,仅对阳性子组中的个体进行复检。该划分方案旨在基于标本状态概率模型最小化期望检测次数。我们放宽了标本状态独立同分布的传统假设,将其建模为可交换随机变量。在此情况下,其联合分布具有置换不变性,即呈现对称特征。我们通过函数q给出此类分布的特征化表示,其中q(h)表示任意大小为h的子组检测结果为阴性的边际概率。利用该可解释表示,我们证明多夫曼协议中产生的集合划分问题可转化为整数划分问题并高效求解。我们将该方法应用于COVID-19大流行期间的实证数据集,该分析有助于解释原始研究者报告的超预期高经验效率。