Single-cell RNA sequencing (scRNA-seq) has revolutionized our ability to study individual cellular distinctions and uncover unique cell characteristics. However, a significant technical challenge in scRNA-seq analysis is the occurrence of "dropout" events, where certain gene expressions cannot be detected. This issue is particularly pronounced in genes with low or sparse expression levels, impacting the precision and interpretability of the obtained data. To address this challenge, various imputation methods have been implemented to predict such missing values, aiming to enhance the analysis's accuracy and usefulness. A prevailing hypothesis posits that scRNA-seq data conforms to a zero-inflated negative binomial (ZINB) distribution. Consequently, methods have been developed to model the data according to this distribution. Recent trends in scRNA-seq analysis have seen the emergence of deep learning approaches. Some techniques, such as the variational autoencoder, incorporate the ZINB distribution as a model loss function. Graph-based methods like Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT) have also gained attention as deep learning methodologies for scRNA-seq analysis. This study introduces scVGAE, an innovative approach integrating GCN into a variational autoencoder framework while utilizing a ZINB loss function. This integration presents a promising avenue for effectively addressing dropout events in scRNA-seq data, thereby enhancing the accuracy and reliability of downstream analyses. scVGAE outperforms other methods in cell clustering, with the best performance in 11 out of 14 datasets. Ablation study shows all components of scVGAE are necessary. scVGAE is implemented in Python and downloadable at https://github.com/inoue0426/scVGAE.
翻译:单细胞RNA测序(scRNA-seq)技术彻底改变了我们研究个体细胞差异和揭示独特细胞特征的能力。然而,scRNA-seq分析中一个重大的技术挑战是“丢失”事件的发生,即某些基因表达无法被检测到。这一问题在表达水平低或稀疏的基因中尤为突出,影响了所获数据的精确性和可解释性。为应对这一挑战,已实施了多种插补方法来预测此类缺失值,旨在提高分析的准确性和实用性。一个普遍存在的假设认为,scRNA-seq数据符合零膨胀负二项(ZINB)分布。因此,已开发出根据此分布对数据进行建模的方法。scRNA-seq分析的最新趋势见证了深度学习方法的兴起。一些技术,如变分自编码器,将ZINB分布作为模型损失函数纳入其中。基于图的方法,如图卷积网络(GCN)和图注意力网络(GAT),也作为scRNA-seq分析的深度学习方法而受到关注。本研究提出了scVGAE,这是一种将GCN集成到变分自编码器框架中,同时利用ZINB损失函数的创新方法。这种集成为有效处理scRNA-seq数据中的丢失事件提供了一条有前景的途径,从而提高了下游分析的准确性和可靠性。scVGAE在细胞聚类方面优于其他方法,在14个数据集中的11个上取得了最佳性能。消融研究表明scVGAE的所有组件都是必要的。scVGAE使用Python实现,可从https://github.com/inoue0426/scVGAE下载。