Metricizing the Euclidean Space towards Desired Distance Relations in Point Clouds

Given a set of points in the Euclidean space $\mathbb{R}^\ell$ with $\ell>1$, the pairwise distances between the points are determined by their spatial location and the metric $d$ that we endow $\mathbb{R}^\ell$ with. Hence, the distance $d(\mathbf x,\mathbf y)=\delta$ between two points is fixed by the choice of $\mathbf x$ and $\mathbf y$ and $d$. We study the related problem of fixing the value $\delta$, and the points $\mathbf x,\mathbf y$, and ask if there is a topological metric $d$ that computes the desired distance $\delta$. We demonstrate this problem to be solvable by constructing a metric to simultaneously give desired pairwise distances between up to $O(\sqrt\ell)$ many points in $\mathbb{R}^\ell$. We then introduce the notion of an $\varepsilon$-semimetric $\tilde{d}$ to formulate our main result: for all $\varepsilon>0$, for all $m\geq 1$, for any choice of $m$ points $\mathbf y_1,\ldots,\mathbf y_m\in\mathbb{R}^\ell$, and all chosen sets of values $\{\delta_{ij}\geq 0: 1\leq i<j\leq m\}$, there exists an $\varepsilon$-semimetric $\tilde{\delta}:\mathbb{R}^\ell\times \mathbb{R}^\ell\to\mathbb{R}$ such that $\tilde{d}(\mathbf y_i,\mathbf y_j)=\delta_{ij}$, i.e., the desired distances are accomplished, irrespectively of the topology that the Euclidean or other norms would induce. We showcase our results by using them to attack unsupervised learning algorithms, specifically $k$-Means and density-based (DBSCAN) clustering algorithms. These have manifold applications in artificial intelligence, and letting them run with externally provided distance measures constructed in the way as shown here, can make clustering algorithms produce results that are pre-determined and hence malleable. This demonstrates that the results of clustering algorithms may not generally be trustworthy, unless there is a standardized and fixed prescription to use a specific distance function.

翻译：给定欧氏空间 $\mathbb{R}^\ell$（$\ell>1$）中的一组点，点对之间的距离由它们的空间位置以及赋予 $\mathbb{R}^\ell$ 的度量 $d$ 决定。因此，两点 $\mathbf x$ 和 $\mathbf y$ 之间的距离 $d(\mathbf x, \mathbf y)=\delta$ 由 $\mathbf x$、$\mathbf y$ 和 $d$ 的选择固定。我们研究相关的固定值 $\delta$ 以及点 $\mathbf x$ 和 $\mathbf y$ 的问题，并询问是否存在一种拓扑度量 $d$ 来计算期望的距离 $\delta$。我们通过构造一个度量来同时为 $\mathbb{R}^\ell$ 中最多 $O(\sqrt\ell)$ 个点提供期望的点对距离，证明该问题可解。然后我们引入 $\varepsilon$-半度量 $\tilde{d}$ 的概念来阐述主要结果：对于所有 $\varepsilon>0$，所有 $m\geq 1$，任意选择的 $m$ 个点 $\mathbf y_1,\ldots,\mathbf y_m\in\mathbb{R}^\ell$，以及任意选择的数值集合 $\{\delta_{ij}\geq 0: 1\leq i<j\leq m\}$，存在一个 $\varepsilon$-半度量 $\tilde{\delta}:\mathbb{R}^\ell\times \mathbb{R}^\ell\to\mathbb{R}$，使得 $\tilde{d}(\mathbf y_i,\mathbf y_j)=\delta_{ij}$，即实现了期望的距离，无论欧几里得范数或其他范数所诱导的拓扑为何。我们通过将结果应用于无监督学习算法（特别是 $k$-Means 和基于密度的 DBSCAN 聚类算法）来展示其用途。这些算法在人工智能中有广泛的应用，通过让它们运行按照此处所示方式构建的外部提供的距离度量，可以使聚类算法产生预定且因此可操控的结果。这表明聚类算法的结果通常可能不可信，除非存在标准化且固定的规范来使用特定的距离函数。