The global dimensionality of a neural representation manifold provides rich insight into the computational process underlying both artificial and biological neural networks. However, all existing measures of global dimensionality are sensitive to the number of samples, i.e., the number of rows and columns of the sample matrix. We show that, in particular, the participation ratio of eigenvalues, a popular measure of global dimensionality, is highly biased with small sample sizes, and propose a bias-corrected estimator that is more accurate with finite samples and with noise. On synthetic data examples, we demonstrate that our estimator can recover the true known dimensionality. We apply our estimator to neural brain recordings, including calcium imaging, electrophysiological recordings, and fMRI data, and to the neural activations in a large language model and show our estimator is invariant to the sample size. Finally, our estimators can additionally be used to measure the local dimensionalities of curved neural manifolds by weighting the finite samples appropriately.
翻译:神经表征流形的全局维度为理解人工与生物神经网络背后的计算过程提供了丰富的洞见。然而,所有现有的全局维度度量方法都对样本数量(即样本矩阵的行数与列数)敏感。我们指出,尤其是一种流行的全局维度度量——特征值的参与比,在小样本量下存在显著偏差,并提出了一种偏差校正估计量,该估计量在有限样本和存在噪声的情况下更为准确。在合成数据示例中,我们证明了我们的估计量能够恢复真实的已知维度。我们将该估计量应用于神经大脑记录数据(包括钙成像、电生理记录和fMRI数据)以及大型语言模型的神经激活,结果表明我们的估计量对样本量具有不变性。最后,通过适当加权有限样本,我们的估计量还可用于测量弯曲神经流形的局部维度。