scVGAE: A Novel Approach using ZINB-Based Variational Graph Autoencoder for Single-Cell RNA-Seq Imputation

Single-cell RNA sequencing (scRNA-seq) has revolutionized our ability to study individual cellular distinctions and uncover unique cell characteristics. However, a significant technical challenge in scRNA-seq analysis is the occurrence of "dropout" events, where certain gene expressions cannot be detected. This issue is particularly pronounced in genes with low or sparse expression levels, impacting the precision and interpretability of the obtained data. To address this challenge, various imputation methods have been implemented to predict such missing values, aiming to enhance the analysis's accuracy and usefulness. A prevailing hypothesis posits that scRNA-seq data conforms to a zero-inflated negative binomial (ZINB) distribution. Consequently, methods have been developed to model the data according to this distribution. Recent trends in scRNA-seq analysis have seen the emergence of deep learning approaches. Some techniques, such as the variational autoencoder, incorporate the ZINB distribution as a model loss function. Graph-based methods like Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT) have also gained attention as deep learning methodologies for scRNA-seq analysis. This study introduces scVGAE, an innovative approach integrating GCN into a variational autoencoder framework while utilizing a ZINB loss function. This integration presents a promising avenue for effectively addressing dropout events in scRNA-seq data, thereby enhancing the accuracy and reliability of downstream analyses. scVGAE outperforms other methods in cell clustering, with the best performance in 11 out of 14 datasets. Ablation study shows all components of scVGAE are necessary. scVGAE is implemented in Python and downloadable at https://github.com/inoue0426/scVGAE.

翻译：单细胞RNA测序（scRNA-seq）技术彻底改变了我们研究个体细胞差异和揭示独特细胞特征的能力。然而，scRNA-seq分析中一个重大的技术挑战是“丢失”事件的发生，即某些基因表达无法被检测到。这一问题在表达水平低或稀疏的基因中尤为突出，影响了所获数据的精确性和可解释性。为应对这一挑战，已实施了多种插补方法来预测此类缺失值，旨在提高分析的准确性和实用性。一个普遍存在的假设认为，scRNA-seq数据符合零膨胀负二项（ZINB）分布。因此，已开发出根据此分布对数据进行建模的方法。scRNA-seq分析的最新趋势见证了深度学习方法的兴起。一些技术，如变分自编码器，将ZINB分布作为模型损失函数纳入其中。基于图的方法，如图卷积网络（GCN）和图注意力网络（GAT），也作为scRNA-seq分析的深度学习方法而受到关注。本研究提出了scVGAE，这是一种将GCN集成到变分自编码器框架中，同时利用ZINB损失函数的创新方法。这种集成为有效处理scRNA-seq数据中的丢失事件提供了一条有前景的途径，从而提高了下游分析的准确性和可靠性。scVGAE在细胞聚类方面优于其他方法，在14个数据集中的11个上取得了最佳性能。消融研究表明scVGAE的所有组件都是必要的。scVGAE使用Python实现，可从https://github.com/inoue0426/scVGAE下载。

相关内容

自编码器

关注 141

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日