On The MCMC Performance In Bernoulli Group Testing And The Random Max Set-Cover Problem

The group testing problem is a canonical inference task where one seeks to identify $k$ infected individuals out of a population of $n$ people, based on the outcomes of $m$ group tests. Of particular interest is the case of Bernoulli group testing (BGT), where each individual participates in each test independently and with a fixed probability. BGT is known to be an "information-theoretically" optimal design, as there exists a decoder that can identify with high probability as $n$ grows the infected individuals using $m^*=\log_2 \binom{n}{k}$ BGT tests, which is the minimum required number of tests among \emph{all} group testing designs. An important open question in the field is if a polynomial-time decoder exists for BGT which succeeds also with $m^*$ samples. In a recent paper (Iliopoulos, Zadik COLT '21) some evidence was presented (but no proof) that a simple low-temperature MCMC method could succeed. The evidence was based on a first-moment (or "annealed") analysis of the landscape, as well as simulations that show the MCMC success for $n \approx 1000s$. In this work, we prove that, despite the intriguing success in simulations for small $n$, the class of MCMC methods proposed in previous work for BGT with $m^*$ samples takes super-polynomial-in-$n$ time to identify the infected individuals, when $k=n^{\alpha}$ for $\alpha \in (0,1)$ small enough. Towards obtaining our results, we establish the tight max-satisfiability thresholds of the random $k$-set cover problem, a result of potentially independent interest in the study of random constraint satisfaction problems.

翻译：群组检测问题是一个经典的推断任务，其目标是在对n个人进行m次群组测试的结果基础上，识别出其中的k个感染者。伯努利群组检测（BGT）尤其受到关注，在该设计中每个个体以固定概率独立地参与每次测试。已知BGT是一种"信息论"意义上的最优设计，因为存在一种解码器，当n增长时，能够以高概率使用m* = log₂ C(n, k)次BGT测试识别出感染者，而这是在所有群组检测设计中所需的最小测试次数。该领域一个重要的开放问题是：是否存在一种多项式时间解码器，同样仅使用m*次样本也能成功完成BGT的解码。在最近的一篇论文（Iliopoulos, Zadik COLT '21）中，提出了一些证据（但未提供证明）表明一种简单的低温MCMC方法可能成功。该证据基于对解空间的一阶矩（或称"退火"）分析，以及显示MCMC在n ≈ 1000量级时成功的模拟实验。在本工作中，我们证明，尽管在小n的模拟中取得了引人注目的成功，但先前工作中提出的针对使用m*次样本的BGT的MCMC方法类，在k = n^α（其中α ∈ (0,1)足够小）时，需要超多项式（关于n）的时间才能识别出感染者。为了得到我们的结果，我们确立了随机k-集合覆盖问题的紧致最大可满足性阈值，这一结果对于随机约束满足问题的研究可能具有独立的学术价值。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日