We consider learning a probabilistic classifier from partially-labelled supervision (inputs denoted with multiple possibilities) using standard neural architectures with a softmax as the final layer. We identify a bias phenomenon that can arise from the softmax layer in even simple architectures that prevents proper exploration of alternative options, making the dynamics of gradient descent overly sensitive to initialisation. We introduce a novel loss function that allows for unbiased exploration within the space of alternative outputs. We give a theoretical justification for our loss function, and provide an extensive evaluation of its impact on synthetic data, on standard partially labelled benchmarks and on a contributed novel benchmark related to an existing rule learning challenge.
翻译:我们考虑使用以softmax作为最终层的标准神经网络架构,从部分标签监督(输入标注多个可能类别)中学习概率分类器。我们发现,即使是简单架构中的softmax层也可能引发一种偏差现象,该现象阻碍了对备选方案的合理探索,使得梯度下降的动力学过程对初始值过度敏感。为此,我们提出一种新型损失函数,能够在备选输出空间内实现无偏探索。我们从理论上论证了该损失函数的合理性,并在合成数据、标准部分标签基准测试以及一项与现有规则学习挑战相关的原创基准测试中,对其效果进行了全面评估。