Quantification, or prevalence estimation, is the task of predicting the prevalence of each class within an unknown bag of examples. Most existing quantification methods in the literature rely on prior probability shift assumptions to create a quantification model that uses the predictions of an underlying classifier to make optimal prevalence estimates. In this work, we present an end-to-end neural network that uses Gaussian distributions in latent spaces to obtain invariant representations of bags of examples. This approach addresses the quantification problem using deep learning, enabling the optimization of specific loss functions relevant to the problem and avoiding the need for an intermediate classifier, tackling the quantification problem as a direct optimization problem. Our method achieves state-of-the-art results, both against traditional quantification methods and other deep learning approaches for quantification. The code needed to reproduce all our experiments is publicly available at https://github.com/AICGijon/gmnet.
翻译:量化,或称流行度估计,是指预测未知样本集中各类别占比的任务。现有文献中的大多数量化方法依赖于先验概率偏移假设,通过构建基于底层分类器预测结果的量化模型来获得最优流行度估计。本研究提出一种端到端神经网络,该方法利用潜在空间中的高斯分布来获取样本集的不变表示。该方案通过深度学习处理量化问题,能够优化与该问题相关的特定损失函数,且无需中间分类器,从而将量化问题转化为直接优化问题。与传统量化方法及其他基于深度学习的量化方法相比,本方法均取得了最先进的性能表现。重现全部实验所需的代码已公开于 https://github.com/AICGijon/gmnet。