In conventional supervised classification, true labels are required for individual instances. However, it could be prohibitive to collect the true labels for individual instances, due to privacy concerns or unaffordable annotation costs. This motivates the study on classification from aggregate observations (CFAO), where the supervision is provided to groups of instances, instead of individual instances. CFAO is a generalized learning framework that contains various learning problems, such as multiple-instance learning and learning from label proportions. The goal of this paper is to present a novel universal method of CFAO, which holds an unbiased estimator of the classification risk for arbitrary losses -- previous research failed to achieve this goal. Practically, our method works by weighing the importance of each label for each instance in the group, which provides purified supervision for the classifier to learn. Theoretically, our proposed method not only guarantees the risk consistency due to the unbiased risk estimator but also can be compatible with arbitrary losses. Extensive experiments on various problems of CFAO demonstrate the superiority of our proposed method.
翻译:在传统监督分类中,需要为每个实例提供真实标签。然而,由于隐私问题或难以承受的标注成本,收集单个实例的真实标签可能难以实现。这推动了对基于聚合观测数据分类(CFAO)的研究,其中监督信息以实例组的形式提供,而非单个实例。CFAO是一个广义学习框架,涵盖多种学习问题,例如多实例学习和基于标签比例的學習。本文旨在提出一种新颖的CFAO通用方法,该方法对任意损失函数均能保持分类风险的无偏估计——以往研究未能实现这一目标。在实际层面,我们的方法通过权衡组内每个实例的标签重要性来运作,从而为分类器提供纯净的监督信息。在理论层面,所提方法不仅因无偏风险估计器保证了风险一致性,还能兼容任意损失函数。在多种CFAO问题上的大量实验证明了我们方法的优越性。