Many data sets cannot be accurately described by standard probability distributions due to the excess number of zero values present. For example, zero-inflation is prevalent in microbiome data and single-cell RNA sequencing data, which serve as our real data examples. Several models have been proposed to address zero-inflated datasets including the zero-inflated negative binomial, hurdle negative binomial model, and the truncated latent Gaussian copula model. This study aims to compare various models and determine which one performs optimally under different conditions using both simulation studies and real data analyses. We are particularly interested in investigating how dependence among the variables, level of zero-inflation or deflation, and variance of the data affects model selection.
翻译:由于存在过多的零值,许多数据集无法用标准概率分布准确描述。例如,零膨胀现象在微生物组数据和单细胞RNA测序数据中普遍存在,这些数据将作为我们的实际案例。目前已提出多种模型来处理零膨胀数据集,包括零膨胀负二项模型、跨栏负二项模型以及截断潜高斯Copula模型。本研究旨在通过模拟研究和实际数据分析,比较不同模型并确定其在各种条件下的最优性能。我们特别关注变量间的相关性、零膨胀或零收缩的程度以及数据方差如何影响模型选择。