We consider the problem of Learning from Label Proportions (LLP), a weakly supervised classification setup where instances are grouped into "bags", and only the frequency of class labels at each bag is available. Albeit, the objective of the learner is to achieve low task loss at an individual instance level. Here we propose Easyllp: a flexible and simple-to-implement debiasing approach based on aggregate labels, which operates on arbitrary loss functions. Our technique allows us to accurately estimate the expected loss of an arbitrary model at an individual level. We showcase the flexibility of our approach by applying it to popular learning frameworks, like Empirical Risk Minimization (ERM) and Stochastic Gradient Descent (SGD) with provable guarantees on instance level performance. More concretely, we exhibit a variance reduction technique that makes the quality of LLP learning deteriorate only by a factor of k (k being bag size) in both ERM and SGD setups, as compared to full supervision. Finally, we validate our theoretical results on multiple datasets demonstrating our algorithm performs as well or better than previous LLP approaches in spite of its simplicity.
翻译:我们考虑标签比例学习(LLP)问题,这是一种弱监督分类设置,其中实例被分组为“包”,且每个包中仅可获得类别标签的频率信息。然而,学习者的目标是在单个实例层面实现低任务损失。本文提出EasyLLP:一种基于聚合标签的灵活且易于实现的去偏方法,可应用于任意损失函数。我们的技术能够精确估计任意模型在个体层面的期望损失。通过将方法应用于经验风险最小化(ERM)和随机梯度下降(SGD)等主流学习框架,我们展示了该方法的灵活性,并在实例级性能上提供了可证明的保证。具体而言,我们提出一种方差缩减技术,使得在ERM和SGD设置下,LLP学习的质量仅因因子k(k为包大小)而劣化(与完全监督相比)。最后,我们在多个数据集上验证了理论结果,表明尽管算法简洁,但其性能与先前LLP方法相当或更优。