Model selection is a cornerstone of statistical inference, where information criteria are widely employed to balance model fit and complexity. However, classical likelihood-based criteria are often highly sensitive to contamination, outliers, and model misspecification. In this paper, we develop a robust alternative based on the Exponential-Polynomial Divergence, a flexible extension of existing divergence measures that enhances adaptability to diverse data irregularities. The proposed Exponential-Polynomial Divergence Information Criterion preserves the objective of approximating the discrepancy between the true model and candidate models while incorporating robustness against anomalous observations. Its theoretical properties are established, and robustness is examined through influence function analysis, demonstrating controlled sensitivity to extreme data points. For practical implementation, a data-driven tuning parameter selection strategy based on generalized score matching is employed, ensuring improved computational stability and efficiency. The effectiveness of the proposed method is demonstrated through extensive simulation studies under varying contamination levels, as well as real data applications involving linear mixed-effects panel data models and neural network-based prediction tasks. The results consistently show improved stability and reliability compared to classical likelihood and density power divergence-based information criteria. The proposed framework thus provides a practical and unified approach for model selection in complex and contaminated data settings.
翻译:模型选择是统计推断的基石,其中信息准则被广泛用于平衡模型拟合与复杂性。然而,传统的基于似然的准则往往对污染、异常值和模型误设定高度敏感。本文提出一种基于指数-多项式散度的稳健替代方法,该散度是对现有散度度量的灵活扩展,能增强对多样化数据不规则性的适应性。所提出的指数-多项式散度信息准则保留了近似真实模型与候选模型之间差异的目标,同时纳入了对异常观测的稳健性。其理论性质得以建立,并通过影响函数分析检验稳健性,表明对极端数据点具有受控敏感性。在实际应用中,采用基于广义评分匹配的数据驱动调参选择策略,以确保计算稳定性与效率的提升。通过不同污染水平下的广泛模拟研究,以及涉及线性混合效应面板数据模型和基于神经网络的预测任务的实际数据应用,验证了所提方法的有效性。与基于经典似然和密度幂散度的信息准则相比,结果一致展现出更优的稳定性与可靠性。因此,所提出的框架为复杂及污染数据环境下的模型选择提供了一种实用且统一的途径。