Repro Samples Method for a Performance Guaranteed Inference in General and Irregular Inference Problems

Rapid advancements in data science require us to have fundamentally new frameworks to tackle prevalent but highly non-trivial "irregular" inference problems, to which the large sample central limit theorem does not apply. Typical examples are those involving discrete or non-numerical parameters and those involving non-numerical data, etc. In this article, we present an innovative, wide-reaching, and effective approach, called "repro samples method," to conduct statistical inference for these irregular problems plus more. The development relates to but improves several existing simulation-inspired inference approaches, and we provide both exact and approximate theories to support our development. Moreover, the proposed approach is broadly applicable and subsumes the classical Neyman-Pearson framework as a special case. For the often-seen irregular inference problems that involve both discrete/non-numerical and continuous parameters, we propose an effective three-step procedure to make inferences for all parameters. We also develop a unique matching scheme that turns the discreteness of discrete/non-numerical parameters from an obstacle for forming inferential theories into a beneficial attribute for improving computational efficiency. We demonstrate the effectiveness of the proposed general methodology using various examples, including a case study example on a Gaussian mixture model with unknown number of components. This case study example provides a solution to a long-standing open inference question in statistics on how to quantify the estimation uncertainty for the unknown number of components and other associated parameters. Real data and simulation studies, with comparisons to existing approaches, demonstrate the far superior performance of the proposed method.

翻译：数据科学的快速发展要求我们建立全新的框架来处理普遍存在但高度非平凡的"不规则"推断问题，这类问题无法应用大样本中心极限定理。典型例子包括涉及离散或非数值参数的问题，以及涉及非数值数据的问题等。本文提出一种创新、广泛适用且有效的方法，称为"再生样本方法"，用于处理这些不规则问题及更多类型的统计推断。该方法的构建关联并改进了若干现有的模拟启发式推断方法，我们同时提供了精确与近似理论以支持该框架。此外，所提方法具有广泛适用性，并将经典的Neyman-Pearson框架作为特例包含其中。针对常见的涉及离散/非数值参数与连续参数混合的不规则推断问题，我们提出有效的三步流程对所有参数进行推断。我们还开发了一种独特的匹配方案，将离散/非数值参数的离散性从构建推断理论的障碍转化为提升计算效率的有利属性。我们通过多个示例证明了所提通用方法的有效性，包括对具有未知组分数量的高斯混合模型的案例研究。该案例研究为统计学中长期存在的开放性问题——如何量化未知组分数量及相关参数的估计不确定性——提供了解决方案。通过实际数据与模拟研究，并与现有方法进行比较，结果证明了所提方法具有显著优越的性能。