Mahalanobis metrics are widely used in machine learning in conjunction with methods like $k$-nearest neighbors, $k$-means clustering, and $k$-medians clustering. Despite their importance, there has not been any prior work on applying sketching techniques to speed up algorithms for Mahalanobis metrics. In this paper, we initiate the study of dimension reduction for Mahalanobis metrics. In particular, we provide efficient data structures for solving the Approximate Distance Estimation (ADE) problem for Mahalanobis distances. We first provide a randomized Monte Carlo data structure. Then, we show how we can adapt it to provide our main data structure which can handle sequences of \textit{adaptive} queries and also online updates to both the Mahalanobis metric matrix and the data points, making it amenable to be used in conjunction with prior algorithms for online learning of Mahalanobis metrics.
翻译:马氏度量在机器学习中广泛使用,常与$k$-近邻、$k$-均值聚类和$k$-中位数聚类等方法结合。尽管其重要性显著,但此前尚无研究将草图技术应用于加速马氏度量算法。本文首次开展马氏度量降维研究,特别针对马氏距离的近似距离估计问题,提出了高效数据结构。我们首先设计了一个随机蒙特卡洛数据结构,随后通过自适应改进得到核心数据结构。该结构不仅能处理序列化自适应查询,还能应对马氏度量矩阵与数据点的在线更新,使其适用于现有马氏度量在线学习算法。