Real-world measurements often comprise a dominant signal contaminated by a noisy background. Robustly estimating the dominant signal in practice has been a fundamental statistical problem. Classically, mixture models have been used to cluster the heterogeneous population into homogeneous components. Modeling such data with fully parametric models risks bias under misspecification, while fully nonparametric approaches can dissipate power and computational resources. We propose a middle path: a semiparametric method that models only the dominant component parametrically and leaves the background completely nonparametric, yet remains computationally scalable and statistically robust. So instead of outlier downweighting, traditionally done in robust statistics literature, we maximize the observed likelihood such that the noisy background is absorbed by the nonparametric component. Computationally, we propose a new approximate FFT-accelerated likelihood maximization algorithm. Empirically, this FFT plug-in achieves order-of-magnitude speedups over vanilla weighted EM while preserving statistical accuracy and large sample properties.
翻译:现实测量数据通常由主导信号与噪声背景混合而成。在实际场景中稳健估计主导信号一直是基础性的统计问题。传统上,混合模型被用于将异质性总体划分为同质成分。采用完全参数化模型拟合此类数据会因模型设定错误而产生偏差风险,而完全非参数方法则可能消耗统计功效与计算资源。我们提出一条折中路径:一种对主导成分进行参数化建模、同时完全保留背景非参数性的半参数方法,该方法兼具计算可扩展性与统计稳健性。因此,不同于稳健统计文献中传统采用离群值降权策略,我们通过最大化观测似然函数,使得噪声背景被非参数成分吸收。在计算层面,我们提出一种基于快速傅里叶变换加速的近似似然最大化算法。实证表明,该FFT插件方法在保持统计精度与大样本性质的同时,相比标准加权期望最大化算法实现了数量级的加速。