Density power divergence (DPD) [Basu et al. (1998), Biometrika], designed to estimate the underlying distribution of the observations robustly, comprises an integral term of the power of the parametric density models to be estimated. While the explicit form of the integral term can be obtained for some specific densities (such as normal density and exponential density), its computational intractability has prohibited the application of DPD-based estimation to more general parametric densities, over a quarter of a century since the proposal of DPD. This study proposes a stochastic optimization approach to minimize DPD for general parametric density models and explains its adequacy by referring to conventional theories on stochastic optimization. The proposed approach also can be applied to the minimization of another density power-based $\gamma$-divergence with the aid of unnormalized models [Kanamori and Fujisawa (2015), Biometrika].
翻译:密度幂散度(DPD)[Basu等(1998),Biometrika]旨在稳健地估计观测数据的潜在分布,其包含一个待估计参数密度模型幂次的积分项。尽管对于某些特定密度(如正态密度和指数密度)可以显式得到该积分项,但自DPD提出以来的四分之一世纪多里,其计算上的不可行性阻碍了基于DPD的估计应用于更一般的参数密度。本研究提出一种随机优化方法,用于最小化一般参数密度模型的DPD,并通过参考随机优化的传统理论解释其适用性。借助非归一化模型,所提方法还可用于最小化另一种基于密度幂次的γ-散度[Kanamori and Fujisawa (2015),Biometrika]。