Binary responses arise in a multitude of statistical problems, including binary classification, bioassay, current status data problems and sensitivity estimation. There has been an interest in such problems in the Bayesian nonparametrics community since the early 1970s, but inference given binary data is intractable for a wide range of modern simulation-based models, even when employing MCMC methods. Recently, Christensen (2023) introduced a novel simulation technique based on counting permutations, which can estimate both posterior distributions and marginal likelihoods for any model from which a random sample can be generated. However, the accompanying implementation of this technique struggles when the sample size is too large (n > 250). Here we present perms, a new implementation of said technique which is substantially faster and able to handle larger data problems than the original implementation. It is available both as an R package and a Python library. The basic usage of perms is illustrated via two simple examples: a tractable toy problem and a bioassay problem. A more complex example involving changepoint analysis is also considered. We also cover the details of the implementation and illustrate the computational speed gain of perms via a simple simulation study.
翻译:二元响应出现在众多统计问题中,包括二元分类、生物测定、当前状态数据问题和灵敏度估计。自20世纪70年代初以来,贝叶斯非参数领域对此类问题产生了兴趣,但给定二元数据的推断对于广泛的现代基于模拟的模型而言是难以处理的,即使采用MCMC方法也是如此。最近,Christensen(2023)引入了一种基于排列计数的新型模拟技术,该技术能够为任何可生成随机样本的模型估计后验分布和边际似然。然而,该技术的配套实现在样本量过大(n > 250)时表现不佳。本文介绍了perms,这是该技术的新实现,比原始实现更快,能够处理更大的数据问题。它既可作为R包也可作为Python库使用。通过两个简单示例说明了perms的基本用法:一个易处理的玩具问题和一个生物测定问题。还考虑了一个涉及变点分析的更复杂示例。此外,我们介绍了实现细节,并通过简单的模拟研究说明了perms的计算速度提升。