We consider the problem of Learning from Label Proportions (LLP), a weakly supervised classification setup where instances are grouped into "bags", and only the frequency of class labels at each bag is available. Albeit, the objective of the learner is to achieve low task loss at an individual instance level. Here we propose Easyllp: a flexible and simple-to-implement debiasing approach based on aggregate labels, which operates on arbitrary loss functions. Our technique allows us to accurately estimate the expected loss of an arbitrary model at an individual level. We showcase the flexibility of our approach by applying it to popular learning frameworks, like Empirical Risk Minimization (ERM) and Stochastic Gradient Descent (SGD) with provable guarantees on instance level performance. More concretely, we exhibit a variance reduction technique that makes the quality of LLP learning deteriorate only by a factor of k (k being bag size) in both ERM and SGD setups, as compared to full supervision. Finally, we validate our theoretical results on multiple datasets demonstrating our algorithm performs as well or better than previous LLP approaches in spite of its simplicity.
翻译:我们考虑从标签比例学习(LLP)问题,这是一种弱监督分类设置,其中实例被分组为“袋”,仅能获取每个袋中类别标签的频率。然而,学习者的目标是在单个实例层面实现低任务损失。本文提出EasyLLP:一种基于聚合标签的灵活且易于实现的去偏方法,可适用于任意损失函数。我们的技术使我们能够精确估计任意模型在个体层面的期望损失。通过将我们的方法应用于经验风险最小化(ERM)和随机梯度下降(SGD)等主流学习框架,我们展示了其灵活性,并在实例级别性能上给出了可证明的保证。更具体地,我们展示了一种方差缩减技术,使得在ERM和SGD设置下,相较于全监督,LLP学习的质量仅下降k倍(k为袋大小)。最后,我们在多个数据集上验证了理论结果,表明尽管算法简单,其性能仍与以往LLP方法相当或更优。