Divide-and-conquer with finite sample sizes: valid and efficient possibilistic inference

Divide-and-conquer methods use large-sample approximations to provide frequentist guarantees when each block of data is both small enough to facilitate efficient computation and large enough to support approximately valid inferences. When the overall sample size is small or moderate, likely no suitable division of the data meets both requirements, hence the resulting inference lacks validity guarantees. We propose a new approach, couched in the inferential model framework, that is fully conditional in a Bayesian sense and provably valid in a frequentist sense. The main insight is that existing divide-and-conquer approaches make use of a Gaussianity assumption twice: first in the construction of an estimator, and second in the approximation to its sampling distribution. Our proposal is to retain the first Gaussianity assumption, using a Gaussian working likelihood, but to replace the second with a validification step that uses the sampling distributions of the block summaries determined by the posited model. This latter step, a type of probability-to-possibility transform, is key to the reliability guarantees enjoyed by our approach, which are uniquely general in the divide-and-conquer literature. In addition to finite-sample validity guarantees, our proposed approach is also asymptotically efficient like the other divide-and-conquer solutions available in the literature. Our computational strategy leverages state-of-the-art black-box likelihood emulators. We demonstrate our method's performance via simulations and highlight its flexibility with an analysis of median PM2.5 in Maryborough, Queensland, during the 2023 Australian bushfire season.

翻译：分治法利用大样本近似来提供频率派保证，当每个数据块既足够小以支持高效计算，又足够大以支持近似有效的推断时。当总体样本量较小或中等时，可能不存在同时满足这两个要求的数据划分方式，因此所得推断缺乏有效性保证。我们提出一种新方法，该方法基于推断模型框架，在贝叶斯意义上完全条件化，并在频率派意义上可证明有效。主要洞见在于，现有的分治方法两次利用了高斯性假设：首先在估计量的构建中，其次在其抽样分布的近似中。我们的建议是保留第一个高斯性假设，使用高斯工作似然，但将第二个假设替换为一个验证步骤，该步骤使用由假设模型确定的块汇总统计量的抽样分布。后一步骤是一种概率到可能性的转换，是我们方法所享有的可靠性保证的关键，这种保证在分治文献中具有独特的普遍性。除了有限样本有效性保证外，我们提出的方法也像文献中其他分治解决方案一样具有渐近高效性。我们的计算策略利用了最先进的黑箱似然模拟器。我们通过模拟展示了我们方法的性能，并通过分析2023年澳大利亚丛林火灾季节期间昆士兰州玛丽伯勒的PM2.5中位数来突显其灵活性。