We consider the problem of recovering an unknown matching between a set of $n$ randomly placed points in $\mathbb{R}^d$ and random perturbations of these points. This can be seen as a model for particle tracking and more generally, entity resolution. We use matchings in random geometric graphs to derive minimax lower bounds for this problem that hold under great generality. Using these results we show that for a broad class of distributions, the order of the number of mistakes made by an estimator that minimizes the sum of squared Euclidean distances is minimax optimal when $d$ is fixed and is optimal up to $n^{o(1)}$ factors when $d = o(\log n)$. In the high-dimensional regime we consider a setup where both initial positions and perturbations have independent sub-Gaussian coordinates. In this setup we give sufficient conditions under which the same estimator makes no mistakes with high probability. We prove an analogous result for an adapted version of this estimator that incorporates information on the covariance matrix of the perturbations.
翻译:我们考虑在$\mathbb{R}^d$中随机放置的$n$个点及其随机扰动之间恢复未知匹配的问题。该问题可作为粒子追踪乃至更一般实体解析的模型。通过随机几何图中的匹配,我们推导出该问题在极大 generality 下成立的极小极大下界。利用这些结果,我们证明:对于广泛分布类,当$d$固定时,最小化欧几里得距离平方和的估计器所产生的错误数量阶达到极小极大最优;当$d = o(\log n)$时,该最优性成立至多相差$n^{o(1)}$因子。在高维框架下,我们考虑初始位置与扰动均具有独立次高斯坐标的情形,给出该估计器以高概率不产生错误判断的充分条件。我们进一步证明,对引入扰动协方差矩阵信息的适配版本估计器,可得到类似结论。