This paper revisits the classical problem of interval estimation of a binomial proportion under Huber contamination. Our main result derives the rate of optimal interval length when the contamination proportion is unknown under a local minimax framework, where the performance of an interval is evaluated at each point in the parameter space. By comparing the rate with the optimal length of a confidence interval that is allowed to use the knowledge of contamination proportion, we characterize the exact adaptation cost due to the ignorance of data quality. Our construction of the confidence interval to achieve local length optimality builds on robust hypothesis testing with a new monotonization step, which guarantees valid coverage, boundary-respecting intervals, and an efficient algorithm for computing the endpoints. The general strategy of interval construction can be applied beyond the binomial setting, and leads to optimal interval estimation for Poisson data with contamination as well. We also investigate a closely related Erdős--Rényi model with node contamination. Though its optimal rate of parameter estimation agrees with that of the binomial setting, we show that adaptation to unknown contamination proportion is provably impossible for interval estimation in that setting.
翻译:本文重新审视了在Huber污染下二项比例区间估计的经典问题。我们的主要结果在局部极小极大框架下推导了当污染比例未知时最优区间长度的收敛速率,其中区间性能在参数空间的每个点上进行评估。通过将该速率与允许使用污染比例信息的置信区间最优长度进行比较,我们刻画了因数据质量未知而产生的精确适应代价。为实现局部长度最优性而构建的置信区间建立在稳健假设检验基础上,并引入了新的单调化步骤,从而保证了有效的覆盖概率、边界保持特性以及计算端点的有效算法。该区间构建的一般策略可推广至二项分布之外的情境,同样适用于污染泊松数据的最优区间估计。我们还研究了具有节点污染的Erdős--Rényi模型。尽管其参数估计的最优速率与二项情境一致,但我们证明在该情境下区间估计无法实现针对未知污染比例的适应性。