The architecture of the brain is too complex to be intuitively surveyable without the use of compressed representations that project its variation into a compact, navigable space. The task is especially challenging with high-dimensional data, such as gene expression, where the joint complexity of anatomical and transcriptional patterns demands maximum compression. Established practice is to use standard principal component analysis (PCA), whose computational felicity is offset by limited expressivity, especially at great compression ratios. Employing whole-brain, voxel-wise Allen Brain Atlas transcription data, here we systematically compare compressed representations based on the most widely supported linear and non-linear methods-PCA, kernel PCA, non-negative matrix factorization (NMF), t-stochastic neighbour embedding (t-SNE), uniform manifold approximation and projection (UMAP), and deep auto-encoding-quantifying reconstruction fidelity, anatomical coherence, and predictive utility with respect to signalling, microstructural, and metabolic targets. We show that deep auto-encoders yield superior representations across all metrics of performance and target domains, supporting their use as the reference standard for representing transcription patterns in the human brain.
翻译:大脑结构过于复杂,若不借助压缩表示将其变异映射至紧凑、可导航的空间,则难以直观审视。对于高维数据(如基因表达)而言,该任务尤为艰巨,因为解剖结构与转录模式的联合复杂性要求实现最大程度的压缩。现行通用方法是采用标准主成分分析(PCA),其计算便利性被有限的表现力所抵消,尤其是在高压缩比条件下。本研究利用全脑体素级艾伦脑图谱转录数据,系统比较了基于最受支持的线性与非线性方法——PCA、核PCA、非负矩阵分解(NMF)、t-随机邻域嵌入(t-SNE)、均匀流形逼近与投影(UMAP)以及深度自编码器——所构建的压缩表示,并通过信号传导、微观结构与代谢目标三个维度量化其重建保真度、解剖结构一致性及预测效用。研究结果表明,深度自编码器在所有性能指标及目标领域中均产生更优的表示,支持将其作为人脑转录模式表示的参考标准。