In learning from aggregate labels, the training data consists of sets or "bags" of feature-vectors (instances) along with an aggregate label for each bag derived from the (usually {0,1}-valued) labels of its instances. In learning from label proportions (LLP), the aggregate label is the average of the bag's instance labels, whereas in multiple instance learning (MIL) it is the OR. The goal is to train an instance-level predictor, typically achieved by fitting a model on the training data, in particular one that maximizes the accuracy which is the fraction of satisfied bags i.e., those on which the predicted labels are consistent with the aggregate label. A weak learner has at a constant accuracy < 1 on the training bags, while a strong learner's accuracy can be arbitrarily close to 1. We study the problem of using a weak learner on such training bags with aggregate labels to obtain a strong learner, analogous to supervised learning for which boosting algorithms are known. Our first result shows the impossibility of boosting in LLP using weak classifiers of any accuracy < 1 by constructing a collection of bags for which such weak learners (for any weight assignment) exist, while not admitting any strong learner. A variant of this construction also rules out boosting in MIL for a non-trivial range of weak learner accuracy. In the LLP setting however, we show that a weak learner (with small accuracy) on large enough bags can in fact be used to obtain a strong learner for small bags, in polynomial time. We also provide more efficient, sampling based variant of our procedure with probabilistic guarantees which are empirically validated on three real and two synthetic datasets. Our work is the first to theoretically study weak to strong learning from aggregate labels, with an algorithm to achieve the same for LLP, while proving the impossibility of boosting for both LLP and MIL.
翻译:在从聚合标签学习的问题中,训练数据由特征向量(实例)的集合或“包”组成,每个包附带一个聚合标签,该标签由其内部实例的(通常为{0,1}值)标签推导得出。在标签比例学习(LLP)中,聚合标签是包内实例标签的平均值;而在多示例学习(MIL)中,聚合标签是实例标签的逻辑或(OR)。其目标是训练一个实例级别的预测器,通常通过在训练数据上拟合模型来实现,特别是最大化满足条件的包(即预测标签与聚合标签一致的包)所占比例的准确率。弱学习器在训练包上具有恒定的准确率<1,而强学习器的准确率可以任意接近1。我们研究如何利用在此类带有聚合标签的训练包上的弱学习器来获得强学习器,这与监督学习中已知的Boosting算法类似。我们的第一个结果表明,在LLP中,使用任何准确率<1的弱分类器进行Boosting是不可能的。我们通过构造一组包来证明这一点:对于这组包,存在这样的弱学习器(对于任何权重分配),但不存在任何强学习器。该构造的一个变体也排除了在MIL中,对于弱学习器准确率的一个非平凡范围内进行Boosting的可能性。然而,在LLP设置中,我们证明,在足够大的包上的弱学习器(具有较低的准确率)实际上可以用来获得针对小包的强学习器,且可在多项式时间内完成。我们还提供了一个更高效的、基于采样的算法变体,该变体具有概率性保证,并在三个真实数据集和两个合成数据集上进行了经验验证。我们的工作是首个从理论上研究从聚合标签进行弱到强学习的工作,提出了在LLP中实现该目标的算法,同时证明了在LLP和MIL中进行Boosting的不可能性。