We consider the problem of constructing confidence intervals (CIs) for the population mean of $N$ values $\{x_1, \ldots, x_N\} \subset Σ^N$ based on a random sample of size $n$, denoted by $X^n \equiv (X_1, \ldots, X_n)$, drawn uniformly without replacement (WoR). We begin by focusing on the finite alphabet ($|Σ| = k <\infty$) and moderate accuracy ($\log(1/α_N) \gg (k+1)\log N$) regime, and derive a fundamental lower bound on the width of any level-$(1-α_N)$ CI in terms of the inverse of the WoR rate functions from the theory of large deviations. Guided by this lower bound, we propose a new level-$(1-α_N)$ CI using an empirical inverse rate function, and show that in certain asymptotic regimes the width of this CI matches the lower bound up to constants. We also derive a dual formulation of the inverse rate function that enables efficient computation of our proposed CI. We then move beyond the finite alphabet case and use a Bernoulli coupling idea to construct an almost sure CI for $Σ= [0,1]$, and a conceptually simple nonasymptotic CI for the case of $Σ$ being a $(2,D)$ smooth Banach space. For both finite and general alphabets, our results employ classical large deviation techniques in novel ways, thus establishing new connections between estimation under WoR sampling and the theory of large deviations.
翻译:我们考虑基于大小为$n$的随机样本$X^n \equiv (X_1, \ldots, X_n)$构建总体均值置信区间(CI)的问题,该样本从$N$个取值$\{x_1, \ldots, x_N\} \subset Σ^N$中均匀无放回(WoR)抽取。我们首先关注有限字母表($|Σ| = k <\infty$)和中等精度($\log(1/α_N) \gg (k+1)\log N$)情形,并基于大偏差理论中的WoR率函数逆,推导出任意水平为$(1-α_N)$的CI宽度的基本下界。在此下界的指导下,我们提出了一种使用经验逆率函数的新水平$(1-α_N)$ CI,并证明在某些渐近情形下,该CI的宽度在常数因子内匹配下界。我们还推导了逆率函数的对偶形式,使得所提CI能够高效计算。随后,我们超越有限字母表情形,利用伯努利耦合思想为$Σ= [0,1]$构建了一个几乎必然CI,并为$Σ$是$(2,D)$光滑巴拿赫空间的情形构建了一个概念简单的非渐近CI。对于有限和一般字母表情形,我们的结果以新颖的方式运用了经典大偏差技术,从而在WoR抽样下的估计与大偏差理论之间建立了新的联系。