Mixture-of-Experts (MoE) is a flexible framework that combines multiple specialized submodels (``experts''), by assigning covariate-dependent weights (``gating functions'') to each expert, and have been commonly used for analyzing heterogeneous data. Existing statistical MoE formulations typically assume constant coefficients, for covariate effects within the expert or gating models, which can be inadequate for longitudinal, spatial, or other dynamic settings where covariate influences and latent subpopulation structure evolve across a known dimension. We propose a Varying-Coefficient Mixture of Experts (VCMoE) model that allows all coefficient effects in both the gating functions and expert models to vary along an indexing variable. We establish identifiability and consistency of the proposed model, and develop an estimation procedure, label-consistent EM algorithm, for both fully functional and hybrid specifications, along with the corresponding asymptotic distributions of the resulting estimators. For inference, simultaneous confidence bands are constructed using both asymptotic theory for the maximum discrepancy between the estimated functional coefficients and their true counterparts, and with bootstrap methods. In addition, a generalized likelihood ratio test is developed to examine whether a coefficient function is genuinely varying across the index variable. Simulation studies demonstrate good finite-sample performance, with acceptable bias and satisfactory coverage rates. We illustrate the proposed VCMoE model using a dataset of single nucleus gene expression in embryonic mice to characterize the temporal dynamics of the associations between the expression levels of genes Satb2 and Bcl11b across two latent cell subpopulations of neurons, yielding results that are consistent with prior findings.
翻译:专家混合(MoE)是一种灵活的框架,通过为每个专家分配协变量依赖的权重(“门控函数”)来组合多个专用子模型(“专家”),常用于分析异质性数据。现有的统计MoE公式通常假设专家或门控模型内的协变量效应具有恒定系数,这可能不适用于纵向、空间或其他动态设置,其中协变量影响和潜在亚群结构沿着已知维度演变。我们提出了一种变系数专家混合(VCMoE)模型,允许门控函数和专家模型中的所有系数效应沿着索引变量变化。我们建立了所提出模型的可识别性和一致性,并开发了一种估计程序——标签一致EM算法,适用于完全函数型和混合型设定,同时给出了所得估计量的相应渐近分布。对于推断,我们利用估计函数系数与其真实对应值之间最大差异的渐近理论以及自助法构建了同时置信带。此外,开发了广义似然比检验来检验系数函数是否确实随索引变量变化。模拟研究展示了良好的有限样本性能,具有可接受的偏差和令人满意的覆盖率。我们通过胚胎小鼠单核基因表达数据集阐释了所提出的VCMoE模型,以刻画基因Satb2和Bcl11b表达水平在神经元两个潜在细胞亚群之间关联的时间动态,所得结果与先前发现一致。