In the study of high-dimensional data, it is often assumed that the data set possesses an underlying lower-dimensional structure. A practical model for this structure is an embedded compact manifold with boundary. Since the underlying manifold structure is typically unknown, identifying boundary points from the data distributed on the manifold is crucial for various applications. In this work, we propose a method for detecting boundary points inspired by the widely used locally linear embedding algorithm. We implement this method using two nearest neighborhood search schemes: the $\epsilon$-radius ball scheme and the $K$-nearest neighbor scheme. This algorithm incorporates the geometric information of the data structure, particularly through its close relation with the local covariance matrix. We discuss the selection the key parameter and analyze the algorithm through our exploration of the spectral properties of the local covariance matrix in both neighborhood search schemes. Furthermore, we demonstrate the algorithm's performance with simulated examples.
翻译:在高维数据研究中,常假设数据集具有潜在的低维结构。该结构的一种实用模型是带边界的嵌入紧致流形。由于底层流形结构通常未知,从分布在流形上的数据中识别边界点对于各类应用至关重要。本文提出一种受广泛使用的局部线性嵌入算法启发的边界点检测方法。我们通过两种最近邻搜索方案实现该方法:$\epsilon$半径球方案与$K$最近邻方案。该算法融合了数据结构的几何信息,特别是通过其与局部协方差矩阵的密切关联。我们通过探究两种邻域搜索方案中局部协方差矩阵的谱特性,讨论了关键参数的选择并对算法进行了分析。此外,我们通过仿真算例展示了算法的性能。