The group testing problem is a canonical inference task where one seeks to identify $k$ infected individuals out of a population of $n$ people, based on the outcomes of $m$ group tests. Of particular interest is the case of Bernoulli group testing (BGT), where each individual participates in each test independently and with a fixed probability. BGT is known to be an "information-theoretically" optimal design, as there exists a decoder that can identify with high probability as $n$ grows the infected individuals using $m^*=\log_2 \binom{n}{k}$ BGT tests, which is the minimum required number of tests among \emph{all} group testing designs. An important open question in the field is if a polynomial-time decoder exists for BGT which succeeds also with $m^*$ samples. In a recent paper (Iliopoulos, Zadik COLT '21) some evidence was presented (but no proof) that a simple low-temperature MCMC method could succeed. The evidence was based on a first-moment (or "annealed") analysis of the landscape, as well as simulations that show the MCMC success for $n \approx 1000s$. In this work, we prove that, despite the intriguing success in simulations for small $n$, the class of MCMC methods proposed in previous work for BGT with $m^*$ samples takes super-polynomial-in-$n$ time to identify the infected individuals, when $k=n^{\alpha}$ for $\alpha \in (0,1)$ small enough. Towards obtaining our results, we establish the tight max-satisfiability thresholds of the random $k$-set cover problem, a result of potentially independent interest in the study of random constraint satisfaction problems.
翻译:群组检测问题是一个经典的推断任务,其目标是在对n个人进行m次群组测试的结果基础上,识别出其中的k个感染者。伯努利群组检测(BGT)尤其受到关注,在该设计中每个个体以固定概率独立地参与每次测试。已知BGT是一种"信息论"意义上的最优设计,因为存在一种解码器,当n增长时,能够以高概率使用m* = log₂ C(n, k)次BGT测试识别出感染者,而这是在所有群组检测设计中所需的最小测试次数。该领域一个重要的开放问题是:是否存在一种多项式时间解码器,同样仅使用m*次样本也能成功完成BGT的解码。在最近的一篇论文(Iliopoulos, Zadik COLT '21)中,提出了一些证据(但未提供证明)表明一种简单的低温MCMC方法可能成功。该证据基于对解空间的一阶矩(或称"退火")分析,以及显示MCMC在n ≈ 1000量级时成功的模拟实验。在本工作中,我们证明,尽管在小n的模拟中取得了引人注目的成功,但先前工作中提出的针对使用m*次样本的BGT的MCMC方法类,在k = n^α(其中α ∈ (0,1)足够小)时,需要超多项式(关于n)的时间才能识别出感染者。为了得到我们的结果,我们确立了随机k-集合覆盖问题的紧致最大可满足性阈值,这一结果对于随机约束满足问题的研究可能具有独立的学术价值。