Parameters of sub-populations can be more relevant than super-population ones. For example, a healthcare provider may be interested in the effect of a treatment plan for a specific subset of their patients; policymakers may be concerned with the impact of a policy in a particular state within a given population. In these cases, the focus is on a specific finite population, as opposed to an infinite super-population. Such a population can be characterized by fixing some attributes that are intrinsic to them, leaving unexplained variations like measurement error as random. Inference for a population with fixed attributes can then be modeled as inferring parameters of a conditional distribution. Accordingly, it is desirable that confidence intervals are conditionally valid for the realized population, instead of marginalizing over many possible draws of populations. We provide a statistical inference framework for parameters of finite populations with known attributes. Leveraging the attribute information, our estimators and confidence intervals closely target a specific finite population. When the data is from the population of interest, our confidence intervals attain asymptotic conditional validity given the attributes, and are shorter than those for super-population inference. In addition, we develop procedures to infer parameters of new populations with differing covariate distributions; the confidence intervals are also conditionally valid for the new populations under mild conditions. Our methods extend to situations where the fixed information has a weaker structure or is only partially observed. We demonstrate the validity and applicability of our methods using simulated and real-world data.
翻译:子总体参数可能比超总体参数更具现实意义。例如,医疗机构可能关注特定患者亚群的治疗方案效果;政策制定者则可能关心某项政策在给定总体中的特定州所产生的效应。在此类情形下,关注焦点是特定有限总体而非无限超总体。这类总体可通过固定其固有属性来定义,而将测量误差等未解释变异视为随机因素。针对固定属性总体的推断可建模为条件分布参数的推断问题。因此,置信区间应当对已实现的总体具有条件有效性,而非对所有可能总体抽样的边际化结果。本文提出针对已知属性有限总体参数的统计推断框架。通过利用属性信息,我们的估计量与置信区间能紧密锁定特定有限总体。当数据来自目标总体时,置信区间在给定属性条件下具有渐进条件有效性,且较超总体推断的置信区间更短。此外,我们发展了针对协变量分布存在差异的新总体参数推断方法;在温和条件下,这些置信区间对新总体仍保持条件有效性。该方法还可拓展至固定信息结构较弱或仅部分可观测的情形。通过模拟实验与真实数据验证,我们证明了所提方法的有效性与适用性。