基于群体监测的网络疫情暴发重构 (Reconstructing Network Outbreaks under Group Surveillance)

A key public health problem during an outbreak is to reconstruct the disease cascade from a partial set of confirmed infections. This has been studied extensively under the Maximum Likelihood Estimation (MLE) formulation, which reduces the problem to finding some type of Steiner subgraph on a network. Group surveillance like wastewater or aerosol monitoring is a form of mass/pooled testing where samples from multiple individuals are pooled together and tested once for all. While a single negative test clears multiple individuals, a positive test does not reveal the infected individuals in the test pool. We introduce the POOLCASCADEMLE problem in the setting of a network propagation process, where the goal is to find a MLE cascade subgraph which is consistent with the pooled test outcomes. Previous work on reconstruction assumes that the test results are of individuals, i.e., pools of size one, and requires a consistent cascade to connect the positive testing nodes. In POOLCASCADEMLE, a consistent cascade must choose at least one node in each positive pool, adding another combinatorial layer. We show that, under the Independent Cascade (IC) model, POOLCASCADEMLE is NP-hard, and present an approximation algorithm based on a reduction to the Group Steiner Tree problem. We also consider a one-hop version of this problem, in which the disease can spread for one time step after being seeded. We show that even this restricted version is NP-hard, and develop a method using linear programming relaxation and rounding. We evaluate the performance of our methods on real and synthetic contact networks, in terms of missing infection recovery and prevalence estimation. We find that our approach outperforms meaningful baselines which correspond to pools of size one and use state-of-the-art methods.

翻译：疫情暴发期间，一个关键的公共卫生问题是如何从部分已确认感染病例中重构疾病传播链。该问题已在最大似然估计（MLE）框架下得到广泛研究，其将问题归结为在网络中寻找某种斯坦纳子图。诸如废水或气溶胶监测等群体监测是一种大规模/混合检测形式，即汇集多个个体的样本进行一次整体检测。虽然单次阴性检测可排除多个个体，但阳性检测结果无法揭示检测池中具体哪些个体被感染。我们在网络传播过程的背景下提出了POOLCASCADEMLE问题，其目标是寻找与混合检测结果一致的最大似然估计级联子图。现有重构研究通常假设检测结果针对个体（即规模为1的检测池），并要求一致性级联必须连接所有阳性检测节点。而在POOLCASCADEMLE问题中，一致性级联必须在每个阳性检测池中至少选择一个节点，这增加了额外的组合复杂性。我们证明，在独立级联（IC）模型下，POOLCASCADEMLE属于NP难问题，并提出了一种基于群组斯坦纳树问题归约的近似算法。我们还考虑了该问题的单跳传播版本，即疾病在初始感染后仅能传播一个时间步。我们证明即使这个受限版本也是NP难的，并开发了一种基于线性规划松弛与取整的求解方法。我们在真实与合成的接触网络上评估了所提方法在缺失感染恢复和流行率估计方面的性能。结果表明，相较于对应单个体检测池且采用前沿方法的基准方案，我们的方法具有更优的表现。