We consider the complex data modeling problem motivated by the zero-inflated and overdispersed data from microbiome studies. Analyzing how microbiome abundance is associated with human biological features, such as BMI, is of great importance for host health. Methods based on parametric distributional assumptions, such as zero-inflated Poisson and zero-inflated Negative Binomial regression, have been widely used in modeling such data, yet the parametric assumptions are restricted and hard to verify in real-world applications. We relax the parametric assumptions and propose a semiparametric single-index quantile regression model. It is flexible to include a wide range of possible association functions and adaptable to the various zero proportions across subjects, which relaxes the strong parametric distributional assumptions of most existing zero-inflated data modeling approaches. We establish the asymptotic properties for the index coefficients estimator and quantile regression curve estimation. Through extensive simulation studies, we demonstrate the superior performance of the proposed method regarding model fitting.
翻译:我们考虑由微生物组研究中零膨胀和过度分散数据所驱动的复杂数据建模问题。分析微生物组丰度如何与人体生物学特征(如BMI)相关联,对于宿主健康具有重要意义。基于参数分布假设的方法,如零膨胀泊松回归和零膨胀负二项回归,已广泛用于此类数据的建模,但参数假设限制性强且在实际应用中难以验证。我们放宽了参数假设,提出了一种半参数单指标分位数回归模型。该模型灵活地涵盖了广泛的潜在关联函数,并能适应不同受试者间各异的零比例,从而放宽了现有大多数零膨胀数据建模方法所依赖的强参数分布假设。我们建立了指标系数估计量与分位数回归曲线估计的渐近性质。通过大量模拟研究,我们证明了所提方法在模型拟合方面的优越性能。