Comparing group means is foundational to many statistical areas, including two-sample studies, randomized trials, and difference-in-differences designs, yet heavy-tailed outcomes can make conventional estimators unstable. A common remedy is to winsorize the data before estimating the target mean contrast. The dominant approach, pooled winsorization, computes winsorization thresholds from the combined sample across all groups, while the rarely used alternative, separate winsorization, computes them within each group. We study finite-sample deviation bounds for these two winsorization strategies, and we prove an impossibility result: no deterministic rule for selecting the pooled winsorization level can attain the sub-Gaussian rate. In contrast, separate winsorization attains this rate, and the guarantee extends to general linear contrasts of group means. Simulation studies confirm that pooled winsorization can have substantial bias, while separate winsorization remains nearly unbiased and concentrates well around the truth. These results support a simple recommendation: winsorize within each group rather than after pooling.
翻译:比较组均值是许多统计领域的基础,包括两样本研究、随机试验和双重差分设计,但重尾分布的结果可能使传统估计量不稳定。常见的解决方法是先在数据中应用截尾处理再估计目标均值对比。主流方法——联合截尾处理——从所有组混合样本中计算截尾阈值,而很少使用的替代方法——分组截尾处理——则在每组内部计算阈值。我们研究了这两种截尾策略的有限样本偏差界,并证明了一个不可能性结果:没有确定性的规则能够使联合截尾水平达到次高斯速率。相比之下,分组截尾处理达到了这一速率,且这一保证扩展到了组均值的广义线性对比。模拟研究证实,联合截尾处理可能存在显著偏差,而分组截尾处理几乎无偏且能良好地集中于真实值。这些结果支持一个简单建议:在每组内进行截尾处理,而非混合后处理。