Metric learning aims at finding a suitable distance metric over the input space, to improve the performance of distance-based learning algorithms. In high-dimensional settings, metric learning can also play the role of dimensionality reduction, by imposing a low-rank restriction to the learnt metric. In this paper, instead of training a low-rank metric on high-dimensional data, we consider a randomly compressed version of the data, and train a full-rank metric there. We give theoretical guarantees on the error of distance-based metric learning, with respect to the random compression, which do not depend on the ambient dimension. Our bounds do not make any explicit assumptions, aside from i.i.d. data from a bounded support, and automatically tighten when benign geometrical structures are present. Experimental results on both synthetic and real data sets support our theoretical findings in high-dimensional settings.
翻译:度量学习旨在寻找输入空间上的合适距离度量,以提升基于距离的学习算法的性能。在高维场景中,度量学习可通过在学得度量上施加低秩约束,同时发挥降维的作用。本文不直接在高维数据上训练低秩度量,而是考虑数据的随机压缩版本,并在其中训练满秩度量。针对基于距离的度量学习在随机压缩条件下的误差,我们给出了与空间维度无关的理论保证。除假设数据独立同分布且来自有界支撑集外,我们的界限不依赖任何显式假设,并能在存在良性几何结构时自动收紧。在高维场景下,基于合成与真实数据集的实验结果均支持我们的理论发现。