Incomplete covariate vectors are known to be problematic for estimation and inferences on model parameters, but their impact on prediction performance is less understood. We develop an imputation-free method that builds on a random partition model admitting variable-dimension covariates. Cluster-specific response models further incorporate covariates via linear predictors, facilitating estimation of smooth prediction surfaces with relatively few clusters. We exploit marginalization techniques of Gaussian kernels to analytically project response distributions according to any pattern of missing covariates, yielding a local regression with internally consistent uncertainty propagation that utilizes only one set of coefficients per cluster. Aggressive shrinkage of these coefficients regulates uncertainty due to missing covariates. The method allows in- and out-of-sample prediction for any missingness pattern, even if the pattern in a new subject's incomplete covariate vector was not seen in the training data. We develop an MCMC algorithm for posterior sampling that improves a computationally expensive update for latent cluster allocation. Finally, we demonstrate the model's effectiveness for nonlinear point and density prediction under various circumstances by comparing with other recent methods for regression of variable dimensions on synthetic and real data.
翻译:不完整的协变量向量已知会对模型参数的估计和推断带来问题,但其对预测性能的影响尚不明确。我们提出一种无需插补的方法,该方法基于一个允许变维协变量的随机划分模型。聚类特定的响应模型进一步通过线性预测因子整合协变量,从而在相对较少的聚类数量下实现平滑预测曲面的估计。我们利用高斯核的边际化技术,根据任意缺失协变量的模式分析性地投影响应分布,得到一种局部回归方法,该方法仅利用每聚类一组系数即可实现内部一致的、能传播不确定性。这些系数的激进收缩能够调控因缺失协变量而产生的不确定性。该方法支持对任意缺失模式(即使新个体不完整协变量向量的缺失模式在训练数据中未出现)进行样本内和样本外预测。我们开发了一种用于后验采样的MCMC算法,改进了在潜在聚类分配更新中计算开销昂贵的步骤。最后,通过将本方法与多种近期的变维回归方法在合成数据和真实数据上进行对比,我们在不同情境下展示了该模型在非线性点预测和密度预测中的有效性。