Nonparametric density estimation for compositional data supported on the simplex is examined under a missing at random mechanism. Rather than imputing missing values and estimating the density from a completed data set, we adopt a strategy based on inverse probability weighting. The proposed estimator uses an adaptive Dirichlet kernel, which ensures nonnegativity on the simplex and favorable behavior near the boundary. When the observation probabilities are unknown, they are estimated through a Nadaraya-Watson regression step. The large-sample properties of the estimator are derived, including pointwise bias and variance expansions, optimal smoothing rates, and asymptotic normality. A simulation study investigates its finite-sample performance under varying sample sizes and missing rates. Simulations show our method outperforms inverse-probability-weighted kernel density estimators based on additive and isometric log-ratio transformations of the data for certain target densities. The methodology is further illustrated through an application to leukocyte composition data from the National Health and Nutrition Examination Survey (NHANES), which allows for the identification of the modal immune profile in the sampled population.
翻译:本文研究了在随机缺失机制下,单纯形上成分数据的非参数密度估计问题。我们未采用填补缺失值后基于完整数据集进行密度估计的策略,而是采用了一种基于逆概率加权的方法。所提出的估计器使用自适应狄利克雷核,这确保了在单纯形上的非负性以及在边界附近的良好性质。当观测概率未知时,通过Nadaraya-Watson回归步骤对其进行估计。推导了估计器的大样本性质,包括逐点偏差与方差展开、最优平滑率以及渐近正态性。通过模拟研究考察了其在不同样本量和缺失率下的有限样本性能。模拟结果表明,对于某些目标密度,我们的方法优于基于数据的加性对数比变换和等距对数比变换的逆概率加权核密度估计器。该方法进一步通过应用于美国国家健康与营养调查(NHANES)的白细胞成分数据得到说明,从而能够识别抽样人群中的模态免疫特征。