Real-world data is frequently noisy and ambiguous. In crowdsourcing, for example, human annotators may assign conflicting class labels to the same instances. Partial-label learning (PLL) addresses this challenge by training classifiers when each instance is associated with a set of candidate labels, only one of which is correct. While early PLL methods approximate the true label posterior, they are often computationally intensive. Recent deep learning approaches improve scalability but rely on surrogate losses and heuristic label refinement. We introduce a novel probabilistic framework that directly approximates the posterior distribution over true labels using amortized variational inference. Our method employs neural networks to predict variational parameters from input data, enabling efficient inference. This approach combines the expressiveness of deep learning with the rigor of probabilistic modeling, while remaining architecture-agnostic. Theoretical analysis and extensive experiments on synthetic and real-world datasets demonstrate that our method achieves state-of-the-art performance in both accuracy and efficiency.
翻译:现实世界的数据常常存在噪声和模糊性。例如在众包标注中,人类标注者可能对同一实例分配相互冲突的类别标签。部分标签学习通过在每个实例与一组候选标签(其中仅有一个正确)相关联的情况下训练分类器,以应对这一挑战。早期的PLL方法通过近似真实标签后验分布来解决该问题,但通常计算成本高昂。近期的深度学习方法提升了可扩展性,但依赖于代理损失函数和启发式标签优化策略。我们提出了一种新颖的概率框架,通过摊销变分推断直接近似真实标签的后验分布。该方法利用神经网络根据输入数据预测变分参数,从而实现高效推断。该框架兼具深度学习的表达能力和概率建模的严谨性,同时保持架构无关性。在合成数据集和真实数据集上的理论分析与大量实验表明,本方法在准确性与效率方面均达到最先进水平。