SpectralNet is a graph clustering method that uses neural network to find an embedding that separates the data. So far it was only used with $k$-nn graphs, which are usually constructed using a distance metric (e.g., Euclidean distance). $k$-nn graphs restrict the points to have a fixed number of neighbors regardless of the local statistics around them. We proposed a new SpectralNet similarity metric based on random projection trees (rpTrees). Our experiments revealed that SpectralNet produces better clustering accuracy using rpTree similarity metric compared to $k$-nn graph with a distance metric. Also, we found out that rpTree parameters do not affect the clustering accuracy. These parameters include the leaf size and the selection of projection direction. It is computationally efficient to keep the leaf size in order of $\log(n)$, and project the points onto a random direction instead of trying to find the direction with the maximum dispersion.
翻译:SpectralNet是一种利用神经网络寻找分离数据嵌入的图聚类方法。目前该方法仅用于$k$-近邻图,这类图通常基于距离度量(如欧氏距离)构建。$k$-近邻图限制每个点具有固定数量的邻居,而未考虑其周围的局部统计特征。我们提出了一种基于随机投影树(rpTree)的新型SpectralNet相似性度量。实验表明,与基于距离度量的$k$-近邻图相比,采用rpTree相似性度量的SpectralNet能实现更高的聚类准确率。此外,我们发现rpTree参数(包括叶节点大小和投影方向的选择)对聚类准确率无显著影响。将叶节点大小控制在$\log(n)$量级,并采用随机方向投影(而非寻找最大离散度方向)在计算上具有高效性。