Controlled Discovery and Localization of Signals via Bayesian Linear Programming

Scientists often must simultaneously localize and discover signals. For instance, in genetic fine-mapping, high correlations between nearby genetic variants make it hard to identify the exact locations of causal variants. So the statistical task is to output as many disjoint regions containing a signal as possible, each as small as possible, while controlling false positives. Similar problems arise in any application where signals cannot be perfectly localized, such as locating stars in astronomical surveys and changepoint detection in sequential data. Common Bayesian approaches to these problems involve computing a posterior distribution over signal locations. However, existing procedures to translate these posteriors into actual credible regions for the signals fail to capture all the information in the posterior, leading to lower power and (sometimes) inflated false discoveries. With this motivation, we introduce Bayesian Linear Programming (BLiP). Given a posterior distribution over signals, BLiP outputs credible regions for signals which verifiably nearly maximize expected power while controlling false positives. BLiP overcomes an extremely high-dimensional and nonconvex problem to verifiably nearly maximize expected power while controlling false positives. BLiP is very computationally efficient compared to the cost of computing the posterior and can wrap around nearly any Bayesian model and algorithm. Applying BLiP to existing state-of-the-art analyses of UK Biobank data (for genetic fine-mapping) and the Sloan Digital Sky Survey (for astronomical point source detection) increased power by 30-120% in just a few minutes of additional computation. BLiP is implemented in pyblip (Python) and blipr (R).

翻译：科学家常常需要同时定位并发现信号。例如，在遗传精细定位中，邻近遗传变异之间的高度相关性使得难以识别因果变异的精确位置。因此，统计任务是在控制假阳性的前提下，尽可能多地输出包含信号的互不相交区域，且每个区域尽可能小。任何信号无法完美定位的应用中都会出现类似问题，例如天文巡天中的恒星定位和序列数据中的变点检测。解决这些问题的常见贝叶斯方法涉及计算信号位置的后验分布。然而，将这些后验分布转化为实际的可信信号区域的现有程序未能捕捉后验中的所有信息，导致效力降低且（有时）假发现率升高。基于此动机，我们提出了贝叶斯线性规划（BLiP）。给定信号的后验分布，BLiP输出信号的可信区域，这些区域在控制假阳性的同时可验证地近乎最大化期望效力。BLiP克服了极高维度且非凸的问题，在控制假阳性的同时可验证地近乎最大化期望效力。与计算后验分布的成本相比，BLiP的计算效率极高，并且可以适配几乎任何贝叶斯模型和算法。将BLiP应用于英国生物银行数据（遗传精细定位）和斯隆数字巡天（天文点源检测）的现有最新分析中，仅需额外几分钟计算，效力便提升了30-120%。BLiP已通过pyblip（Python）和blipr（R）实现。