We endeavour to estimate numerous multi-dimensional means of various probability distributions on a common space based on independent samples. Our approach involves forming estimators through convex combinations of empirical means derived from these samples. We introduce two strategies to find appropriate data-dependent convex combination weights: a first one employing a testing procedure to identify neighbouring means with low variance, which results in a closed-form plug-in formula for the weights, and a second one determining weights via minimization of an upper confidence bound on the quadratic risk.Through theoretical analysis, we evaluate the improvement in quadratic risk offered by our methods compared to the empirical means. Our analysis focuses on a dimensional asymptotics perspective, showing that our methods asymptotically approach an oracle (minimax) improvement as the effective dimension of the data increases.We demonstrate the efficacy of our methods in estimating multiple kernel mean embeddings through experiments on both simulated and real-world datasets.
翻译:我们致力于基于独立样本,估计同一空间上多个概率分布的多元均值。我们的方法通过将来自这些样本的经验均值进行凸组合来构建估计量。我们引入两种策略以确定合适的数据依赖凸组合权重:第一种采用检验程序识别具有低方差的邻近均值,从而得到权重的闭合形式插件公式;第二种通过最小化二次风险的上置信界来确定权重。通过理论分析,我们评估了所提方法相较于经验均值在二次风险上的改进。分析聚焦于维度渐近视角,表明随着数据有效维度的增加,我们的方法渐近地接近神谕(极小化极大)改进。我们通过模拟和真实数据集上的实验,验证了所提方法在估计多个核均值嵌入方面的有效性。