$α$-Divergence Loss Function for Neural Density Ratio Estimation

from arxiv, $\mathcal{T}_{\text{Lip}}$ in Theorem 7.1 (Theorem B.15.) was changed to the set of all locally Lipschitz continuous functions. In the previous version, $\mathcal{T}_{\text{Lip}}$ was defined as the set of all Lipschitz continuous functions, which is unsuitable for the statement of case (ii) in the theorem

Density ratio estimation (DRE) is a fundamental machine learning technique for capturing relationships between two probability distributions. State-of-the-art DRE methods estimate the density ratio using neural networks trained with loss functions derived from variational representations of $f$-divergence. However, existing methods face optimization challenges, such as overfitting due to lower-unbounded loss functions, biased mini-batch gradients, vanishing training loss gradients, and high sample requirements for Kullback-Leibler (KL) divergence loss functions. To address these issues, we focus on $\alpha$-divergence, which provides a suitable variational representation of $f$-divergence. Subsequently, a novel loss function for DRE, the $\alpha$-divergence loss function ($\alpha$-Div), is derived. $\alpha$-Div is concise but offers stable and effective optimization for DRE. The boundedness of $\alpha$-divergence provides the potential for successful DRE with data exhibiting high KL-divergence. Our numerical experiments demonstrate the effectiveness in optimization using $\alpha$-Div. However, the experiments also show that the proposed loss function offers no significant advantage over the KL-divergence loss function in terms of RMSE for DRE. This indicates that the accuracy of DRE is primarily determined by the amount of KL-divergence in the data and is less dependent on $\alpha$-divergence.

翻译：密度比估计（DRE）是一种用于捕捉两个概率分布之间关系的基础机器学习技术。最先进的DRE方法使用神经网络估计密度比，这些网络通过从$f$-散度的变分表示导出的损失函数进行训练。然而，现有方法面临优化挑战，例如由于损失函数下无界导致的过拟合、有偏的小批量梯度、训练损失梯度消失，以及Kullback-Leibler（KL）散度损失函数对样本量的高要求。为解决这些问题，我们聚焦于$\alpha$-散度，它提供了$f$-散度的一个合适变分表示。随后，推导出一种用于DRE的新型损失函数——$\alpha$-散度损失函数（$\alpha$-Div）。$\alpha$-Div形式简洁，但为DRE提供了稳定且有效的优化。$\alpha$-散度的有界性为在具有高KL散度的数据上成功进行DRE提供了潜力。我们的数值实验证明了使用$\alpha$-Div进行优化的有效性。然而，实验也表明，就DRE的均方根误差（RMSE）而言，所提出的损失函数相对于KL散度损失函数并未展现出显著优势。这表明DRE的精度主要由数据中的KL散度量决定，而对$\alpha$-散度的依赖较小。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日