Weak-label learning is a challenging task that requires learning from data "bags" containing positive and negative instances, but only the bag labels are known. The pool of negative instances is usually larger than positive instances, thus making selecting the most informative negative instance critical for performance. Such a selection strategy for negative instances from each bag is an open problem that has not been well studied for weak-label learning. In this paper, we study several sampling strategies that can measure the usefulness of negative instances for weak-label learning and select them accordingly. We test our method on CIFAR-10 and AudioSet datasets and show that it improves the weak-label classification performance and reduces the computational cost compared to random sampling methods. Our work reveals that negative instances are not all equally irrelevant, and selecting them wisely can benefit weak-label learning.
翻译:弱标签学习是一项具有挑战性的任务,需要从包含正实例和负实例的“数据包”中进行学习,但已知的只有包的标签。负实例的池通常比正实例更大,因此选择最具信息量的负实例对性能至关重要。这种从每个包中选择负实例的策略是一个尚未在弱标签学习中得到充分研究的开放性问题。在本文中,我们研究了多种采样策略,这些策略能够衡量负实例对弱标签学习的有效性,并据此进行选择。我们在CIFAR-10和AudioSet数据集上测试了我们的方法,结果表明,与随机采样方法相比,该方法提升了弱标签分类性能并降低了计算成本。我们的工作揭示出,负实例并非都同样无关紧要,明智地选择它们有助于弱标签学习。