Gaussian processes are now commonly used in dimensionality reduction approaches tailored to neuroscience, especially to describe changes in high-dimensional neural activity over time. As recording capabilities expand to include neuronal populations across multiple brain areas, cortical layers, and cell types, interest in extending Gaussian process factor models to characterize multi-population interactions has grown. However, the cubic runtime scaling of current methods with the length of experimental trials and the number of recorded populations (groups) precludes their application to large-scale multi-population recordings. Here, we improve this scaling from cubic to linear in both trial length and group number. We present two approximate approaches to fitting multi-group Gaussian process factor models based on (1) inducing variables and (2) the frequency domain. Empirically, both methods achieved orders of magnitude speed-up with minimal impact on statistical performance, in simulation and on neural recordings of hundreds of neurons across three brain areas. The frequency domain approach, in particular, consistently provided the greatest runtime benefits with the fewest trade-offs in statistical performance. We further characterize the estimation biases introduced by the frequency domain approach and demonstrate effective strategies to mitigate them. This work enables a powerful class of analysis techniques to keep pace with the growing scale of multi-population recordings, opening new avenues for exploring brain function.
翻译:高斯过程现已成为神经科学领域降维方法中的常用工具,尤其适用于描述高维神经活动随时间的变化。随着记录能力扩展至涵盖多个脑区、皮层层次和细胞类型的神经元群体,将高斯过程因子模型扩展用于表征多群体交互的需求日益增长。然而,现有方法在实验试次长度和记录群体(组别)数量上的立方级运行时缩放特性,阻碍了其在大规模多群体记录中的应用。本文将该缩放特性从立方级改进为在试次长度和群体数量上均呈线性增长。我们提出了两种基于(1)诱导变量和(2)频域的近似方法来拟合多群体高斯过程因子模型。实验表明,在模拟数据及跨三个脑区的数百个神经元神经记录数据上,两种方法均实现了数量级的加速,且对统计性能影响极小。特别是频域方法,始终以最少的统计性能折衷提供了最大的运行时优势。我们进一步分析了频域方法引入的估计偏差,并展示了有效的缓解策略。这项工作使一类强大的分析技术能够跟上多群体记录规模不断增长的趋势,为探索大脑功能开辟了新途径。