A Shared Nearest Neighbor (SNN) graph is a type of graph construction using shared nearest neighbor information, which is a secondary similarity measure based on the rankings induced by a primary $k$-nearest neighbor ($k$-NN) measure. SNN measures have been touted as being less prone to the curse of dimensionality than conventional distance measures, and thus methods using SNN graphs have been widely used in applications, particularly in clustering high-dimensional data sets and in finding outliers in subspaces of high dimensional data. Despite this, the theoretical study of SNN graphs and graph Laplacians remains unexplored. In this pioneering work, we make the first contribution in this direction. We show that large scale asymptotics of an SNN graph Laplacian reach a consistent continuum limit; this limit is the same as that of a $k$-NN graph Laplacian. Moreover, we show that the pointwise convergence rate of the graph Laplacian is linear with respect to $(k/n)^{1/m}$ with high probability.
翻译:共享最近邻(SNN)图是一种利用共享最近邻信息进行图构建的类型,其基于由原始 $k$-最近邻($k$-NN)度量所诱导的排序作为次要相似性度量。SNN度量被认为比传统距离度量更不易受维数灾难的影响,因此,使用SNN图的方法已广泛应用于各类应用,尤其是在高维数据集聚类和高维数据子空间中的异常点检测中。尽管如此,对SNN图及图拉普拉斯算子的理论研究仍处于空白。在这项开创性工作中,我们首次在此方向上做出了贡献。我们证明了SNN图拉普拉斯算子的大尺度渐近行为会达到一个一致的连续极限;该极限与 $k$-NN图拉普拉斯算子的极限相同。此外,我们证明了图拉普拉斯算子的逐点收敛速度以高概率与 $(k/n)^{1/m}$ 成线性关系。