Real-world data is frequently noisy and ambiguous. In crowdsourcing, for example, human annotators may assign conflicting class labels to the same instances. Partial-label learning (PLL) addresses this challenge by training classifiers when each instance is associated with a set of candidate labels, only one of which is correct. While early PLL methods approximate the true label posterior, they are often computationally intensive. Recent deep learning approaches improve scalability but rely on surrogate losses and heuristic label refinement. We introduce a novel probabilistic framework that directly approximates the posterior distribution over true labels using amortized variational inference. Our method employs neural networks to predict variational parameters from input data, enabling efficient inference. This approach combines the expressiveness of deep learning with the rigor of probabilistic modeling, while remaining architecture-agnostic. Theoretical analysis and extensive experiments on synthetic and real-world datasets demonstrate that our method achieves state-of-the-art performance in both accuracy and efficiency.
翻译:现实世界的数据常常带有噪声和模糊性。例如,在众包标注中,人类标注者可能对同一实例分配相互冲突的类别标签。部分标签学习通过在每个实例关联一组候选标签(其中仅有一个为正确标签)的情况下训练分类器来应对这一挑战。早期的PLL方法通过近似真实标签后验分布来处理该问题,但通常计算成本高昂。近期的深度学习方法提升了可扩展性,但依赖于代理损失函数和启发式标签优化策略。本文提出了一种新颖的概率框架,通过摊销变分推理直接近似真实标签的后验分布。该方法利用神经网络从输入数据预测变分参数,从而实现高效推理。该框架结合了深度学习的表达能力和概率建模的严谨性,同时保持架构无关性。在合成数据集和真实数据集上的理论分析与大量实验表明,本方法在准确性和效率方面均达到了最先进的性能水平。