This paper investigates a range of empirical risk functions and regularization methods suitable for self-training methods in semi-supervised learning. These approaches draw inspiration from various divergence measures, such as $f$-divergences and $\alpha$-R\'enyi divergences. Inspired by the theoretical foundations rooted in divergences, i.e., $f$-divergences and $\alpha$-R\'enyi divergence, we also provide valuable insights to enhance the understanding of our empirical risk functions and regularization techniques. In the pseudo-labeling and entropy minimization techniques as self-training methods for effective semi-supervised learning, the self-training process has some inherent mismatch between the true label and pseudo-label (noisy pseudo-labels) and some of our empirical risk functions are robust, concerning noisy pseudo-labels. Under some conditions, our empirical risk functions demonstrate better performance when compared to traditional self-training methods.
翻译:本文研究了一系列适用于半监督学习中自训练方法的经验风险函数与正则化方法。这些方法从多种散度度量中汲取灵感,例如$f$散度和$\alpha$-Rényi散度。受源于散度(即$f$散度和$\alpha$-Rényi散度)的理论基础启发,我们亦提供了有价值的见解,以加深对经验风险函数及正则化技术的理解。在将伪标签和熵最小化技术作为自训练方法以实现有效的半监督学习时,自训练过程本身存在真实标签与伪标签(即含噪伪标签)之间的固有失配,而我们提出的部分经验风险函数对含噪伪标签具有鲁棒性。在某些条件下,与传统的自训练方法相比,我们的经验风险函数展现出更优的性能。