The solution to empirical risk minimization with $f$-divergence regularization (ERM-$f$DR) is presented under mild conditions on $f$. Under such conditions, the optimal measure is shown to be unique. Examples of the solution for particular choices of the function $f$ are presented. Previously known solutions to common regularization choices are obtained by leveraging the flexibility of the family of $f$-divergences. These include the unique solutions to empirical risk minimization with relative entropy regularization (Type-I and Type-II). The analysis of the solution unveils the following properties of $f$-divergences when used in the ERM-$f$DR problem: $i\bigl)$ $f$-divergence regularization forces the support of the solution to coincide with the support of the reference measure, which introduces a strong inductive bias that dominates the evidence provided by the training data; and $ii\bigl)$ any $f$-divergence regularization is equivalent to a different $f$-divergence regularization with an appropriate transformation of the empirical risk function.
翻译:在关于函数$f$的温和条件下,本文提出了带$f$-散度正则化的经验风险最小化(ERM-$f$DR)问题的解。在此条件下,证明了最优测度的唯一性。针对函数$f$的特定选择,给出了解的实例。通过利用$f$-散度族的灵活性,获得了已知常见正则化选择的解,包括相对熵正则化下经验风险最小化的唯一解(I型和II型)。对解的分析揭示了ERM-$f$DR问题中$f$-散度的以下性质:$i\bigl)$ $f$-散度正则化迫使解的支撑集与参考测度的支撑集一致,从而引入强归纳偏置,该偏置主导训练数据提供的证据;$ii\bigl)$ 任何$f$-散度正则化等价于对经验风险函数施加适当变换后的另一种$f$-散度正则化。