We consider the problem of estimating fold-changes in the expected value of a multivariate outcome observed with unknown sample-specific and category-specific perturbations. This challenge arises in high-throughput sequencing studies of the abundance of microbial taxa because microbes are systematically over- and under-detected relative to their true abundances. Our model admits a partially identifiable estimand, and we establish full identifiability by imposing interpretable parameter constraints. To reduce bias and guarantee the existence of estimators in the presence of sparse observations, we apply an asymptotically negligible and constraint-invariant penalty to our estimating function. We develop a fast coordinate descent algorithm for estimation, and an augmented Lagrangian algorithm for estimation under null hypotheses. We construct a model-robust score test and demonstrate valid inference even for small sample sizes and violated distributional assumptions. The flexibility of the approach and comparisons to related methods are illustrated through a meta-analysis of microbial associations with colorectal cancer.
翻译:我们考虑在样本特定和类别特定的未知扰动下,观测多元结果期望值中倍数变化的估计问题。这一挑战源于微生物类群丰度的高通量测序研究,因为微生物相对于其真实丰度存在系统性过检与漏检。我们的模型允许部分可识别估计量,并通过施加可解释的参数约束建立完全可识别性。为减少偏差并保证在稀疏观测下估计量的存在性,我们对估计函数施加了一个渐近可忽略且约束不变的惩罚项。我们开发了一种快速的坐标下降算法用于估计,以及一种增广拉格朗日算法用于零假设下的估计。我们构建了模型稳健的得分检验,并证明即使在小样本量和分布假设违反的情况下仍能进行有效推断。通过对结直肠癌微生物关联性的荟萃分析,展示了该方法的灵活性及其与相关方法的比较结果。