Density ratio estimation (DRE) is a fundamental machine learning technique for capturing relationships between two probability distributions. State-of-the-art DRE methods estimate the density ratio using neural networks trained with loss functions derived from variational representations of $f$-divergence. However, existing methods face optimization challenges, such as overfitting due to lower-unbounded loss functions, biased mini-batch gradients, vanishing training loss gradients, and high sample requirements for Kullback-Leibler (KL) divergence loss functions. To address these issues, we focus on $\alpha$-divergence, which provides a suitable variational representation of $f$-divergence. Subsequently, a novel loss function for DRE, the $\alpha$-divergence loss function ($\alpha$-Div), is derived. $\alpha$-Div is concise but offers stable and effective optimization for DRE. The boundedness of $\alpha$-divergence provides the potential for successful DRE with data exhibiting high KL-divergence. Our numerical experiments demonstrate the effectiveness in optimization using $\alpha$-Div. However, the experiments also show that the proposed loss function offers no significant advantage over the KL-divergence loss function in terms of RMSE for DRE. This indicates that the accuracy of DRE is primarily determined by the amount of KL-divergence in the data and is less dependent on $\alpha$-divergence.
翻译:密度比估计(DRE)是一种用于捕捉两个概率分布之间关系的基础机器学习技术。最先进的DRE方法使用神经网络估计密度比,这些网络通过从$f$-散度的变分表示导出的损失函数进行训练。然而,现有方法面临优化挑战,例如由于损失函数下无界导致的过拟合、有偏的小批量梯度、训练损失梯度消失,以及Kullback-Leibler(KL)散度损失函数对样本量的高要求。为解决这些问题,我们聚焦于$\alpha$-散度,它提供了$f$-散度的一个合适变分表示。随后,推导出一种用于DRE的新型损失函数——$\alpha$-散度损失函数($\alpha$-Div)。$\alpha$-Div形式简洁,但为DRE提供了稳定且有效的优化。$\alpha$-散度的有界性为在具有高KL散度的数据上成功进行DRE提供了潜力。我们的数值实验证明了使用$\alpha$-Div进行优化的有效性。然而,实验也表明,就DRE的均方根误差(RMSE)而言,所提出的损失函数相对于KL散度损失函数并未展现出显著优势。这表明DRE的精度主要由数据中的KL散度量决定,而对$\alpha$-散度的依赖较小。