Compressed representation of brain genetic transcription

The architecture of the brain is too complex to be intuitively surveyable without the use of compressed representations that project its variation into a compact, navigable space. The task is especially challenging with high-dimensional data, such as gene expression, where the joint complexity of anatomical and transcriptional patterns demands maximum compression. Established practice is to use standard principal component analysis (PCA), whose computational felicity is offset by limited expressivity, especially at great compression ratios. Employing whole-brain, voxel-wise Allen Brain Atlas transcription data, here we systematically compare compressed representations based on the most widely supported linear and non-linear methods-PCA, kernel PCA, non-negative matrix factorization (NMF), t-stochastic neighbour embedding (t-SNE), uniform manifold approximation and projection (UMAP), and deep auto-encoding-quantifying reconstruction fidelity, anatomical coherence, and predictive utility with respect to signalling, microstructural, and metabolic targets. We show that deep auto-encoders yield superior representations across all metrics of performance and target domains, supporting their use as the reference standard for representing transcription patterns in the human brain.

翻译：大脑结构过于复杂，若不借助压缩表示将其变异映射至紧凑、可导航的空间，则难以直观审视。对于高维数据（如基因表达）而言，该任务尤为艰巨，因为解剖结构与转录模式的联合复杂性要求实现最大程度的压缩。现行通用方法是采用标准主成分分析（PCA），其计算便利性被有限的表现力所抵消，尤其是在高压缩比条件下。本研究利用全脑体素级艾伦脑图谱转录数据，系统比较了基于最受支持的线性与非线性方法——PCA、核PCA、非负矩阵分解（NMF）、t-随机邻域嵌入（t-SNE）、均匀流形逼近与投影（UMAP）以及深度自编码器——所构建的压缩表示，并通过信号传导、微观结构与代谢目标三个维度量化其重建保真度、解剖结构一致性及预测效用。研究结果表明，深度自编码器在所有性能指标及目标领域中均产生更优的表示，支持将其作为人脑转录模式表示的参考标准。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日