The purpose of class distribution estimation (also known as quantification) is to determine the values of the prior class probabilities in a test dataset without class label observations. A variety of methods to achieve this have been proposed in the literature, most of them based on the assumption that the distributions of the training and test data are related through prior probability shift (also known as label shift). Among these methods, Friedman's method has recently been found to perform relatively well both for binary and multi-class quantification. We discuss the properties of Friedman's method and another approach mentioned by Friedman (called DeBias method in the literature) in the context of a general framework for designing linear equation systems for class distribution estimation.
翻译:类分布估计(亦称量化)的目的是在缺乏类别标签观测的情况下,确定测试数据集中先验类概率的值。文献中已提出多种实现该目标的方法,其中大多数基于训练数据与测试数据的分布通过先验概率偏移(亦称标签偏移)相关联的假设。在这些方法中,弗里德曼方法最近被发现在二分类与多分类量化任务中均表现相对良好。我们在设计类分布估计线性方程组的通用框架背景下,讨论了弗里德曼方法及弗里德曼提及的另一种方法(文献中称为DeBias方法)的特性。