Semi- and non-parametric mixture of regressions are a very useful flexible class of mixture of regressions in which some or all of the parameters are non-parametric functions of the covariates. These models are, however, based on the Gaussian assumption of the component error distributions. Thus, their estimation is sensitive to outliers and heavy-tailed error distributions. In this paper, we propose semi- and non-parametric contaminated Gaussian mixture of regressions to robustly estimate the parametric and/or non-parametric terms of the models in the presence of mild outliers. The virtue of using a contaminated Gaussian error distribution is that we can simultaneously perform model-based clustering of observations and model-based outlier detection. We propose two algorithms, an expectation-maximization (EM)-type algorithm and an expectation-conditional-maximization (ECM)-type algorithm, to perform maximum likelihood and local-likelihood kernel estimation of the parametric and non-parametric of the proposed models, respectively. The robustness of the proposed models is examined using an extensive simulation study. The practical utility of the proposed models is demonstrated using real data.
翻译:半参数和非参数回归混合模型是一类非常灵活且实用的回归混合模型,其中部分或全部参数是协变量的非参数函数。然而,这些模型基于分量误差分布的高斯假设。因此,其估计对异常值和重尾误差分布较为敏感。本文提出半参数和非参数污染高斯回归混合模型,以在存在温和异常值的情况下稳健地估计模型的参数项和/或非参数项。使用污染高斯误差分布的优点在于,我们可以同时执行基于模型的观测聚类和基于模型的异常值检测。我们提出了两种算法:一种期望最大化(EM)型算法和一种期望条件最大化(ECM)型算法,分别用于对所提模型的参数部分和非参数部分进行最大似然估计和局部似然核估计。通过广泛的模拟研究检验了所提模型的稳健性。实际数据的应用展示了所提模型的实用价值。