We propose Noise-Based Spectral Embedding (NBSE), a physics-informed framework for selecting informative features from high-dimensional data without greedy search. NBSE constructs a sparse similarity graph on the samples and identifies the Nishimori temperature $β_N$ the critical inverse temperature at which the Bethe Hessian becomes singular. The corresponding smallest eigenvector captures the dominant mode of an intrinsically degree-corrected diffusion process, naturally reweighting nodes to prevent hub dominance. By transposing the data matrix and applying NBSE in feature space, we obtain a one-dimensional spectral embedding that reveals groups of redundant or semantically related dimensions; balanced binning then selects one representative per group. We prove that coloured Gaussian perturbations shift $β_N$ by at most $O(\barσ^2)$, guaranteeing robustness to measurement noise. Experiments on ImageNet embeddings from MobileNetV2 and EfficientNet-B4 show that NBSE preserves classification accuracy even under aggressive compression: on EfficientNet-B4 the accuracy drop is below $1\%$ when retaining only $30\%$ of features, outperforming ANOVA $F$-test and random selection by up to $6.8\%$.
翻译:我们提出噪声谱嵌入(NBSE),这是一种受物理学启发的框架,可在无需贪婪搜索的情况下从高维数据中选择信息特征。NBSE在样本上构建稀疏相似图,并识别西森温度$β_N$,即Bethe Hessian矩阵变为奇异时的临界逆温度。对应的最小特征向量捕捉本质上具有度校正的扩散过程的主导模式,自然地重新加权节点以防止枢纽点主导。通过转置数据矩阵并在特征空间中应用NBSE,我们获得一维谱嵌入,揭示冗余或语义相关维度的分组;随后通过平衡分箱从每组中选择一个代表性特征。我们证明有色高斯扰动会使$β_N$最多偏移$O(\barσ^2)$,从而保证对测量噪声的鲁棒性。在来自MobileNetV2和EfficientNet-B4的ImageNet嵌入上的实验表明,即使在激进压缩下,NBSE也能保持分类精度:对于EfficientNet-B4,在仅保留$30\%$特征时精度下降低于$1\%$,比ANOVA $F$检验和随机选择高出最多$6.8\%$。