In deep learning, classification tasks are formalized as optimization problems often solved via the minimization of the cross-entropy. However, recent advancements in the design of objective functions allow the usage of the $f$-divergence to generalize the formulation of the optimization problem for classification. We adopt a Bayesian perspective and formulate the classification task as a maximum a posteriori probability problem. We propose a class of objective functions based on the variational representation of the $f$-divergence. Furthermore, driven by the challenge of improving the state-of-the-art approach, we propose a bottom-up method that leads us to the formulation of an objective function corresponding to a novel $f$-divergence referred to as shifted log (SL). We theoretically analyze the objective functions proposed and numerically test them in three application scenarios: toy examples, image datasets, and signal detection/decoding problems. The analyzed scenarios demonstrate the effectiveness of the proposed approach and that the SL divergence achieves the highest classification accuracy in almost all the considered cases.
翻译:在深度学习中,分类任务常被形式化为通过最小化交叉熵求解的优化问题。然而,目标函数设计的最新进展允许使用 $f$-散度来泛化分类优化问题的表述。我们采用贝叶斯视角,将分类任务视为最大后验概率问题,并提出一类基于 $f$-散度变分表示的目标函数。此外,受改进前沿方法的挑战驱动,我们提出一种自底向上的方法,由此推导出一种对应新型 $f$-散度(称为移位对数(SL))的目标函数。我们对所提出的目标函数进行理论分析,并在三类应用场景(玩具示例、图像数据集及信号检测/解码问题)中开展数值实验。分析结果验证了所提方法的有效性,并表明在几乎所有测试案例中,SL散度均能取得最高的分类准确率。