Maximization of mutual information between the model's input and output is formally related to "decisiveness" and "fairness" of the softmax predictions, motivating such unsupervised entropy-based losses for discriminative neural networks. Recent self-labeling methods based on such losses represent the state of the art in deep clustering. However, some important properties of entropy clustering are not well-known, or even misunderstood. For example, we provide a counterexample to prior claims about equivalence to variance clustering (K-means) and point out technical mistakes in such theories. We discuss the fundamental differences between these discriminative and generative clustering approaches. Moreover, we show the susceptibility of standard entropy clustering to narrow margins and motivate an explicit margin maximization term. We also propose an improved self-labeling loss; it is robust to pseudo-labeling errors and enforces stronger fairness. We develop an EM algorithm for our loss that is significantly faster than the standard alternatives. Our results improve the state-of-the-art on standard benchmarks.
翻译:模型输入与输出之间互信息的最大化与softmax预测的“决定性”和“公平性”在形式上相关,这推动了判别式神经网络中基于无监督熵损失函数的发展。基于此类损失函数的最新自标注方法代表了深度聚类的当前最优水平。然而,熵聚类的一些重要性质尚未被充分认识,甚至存在误解。例如,我们提供了反例反驳先前关于熵聚类与方差聚类(K-means)等价性的论断,并指出相关理论中的技术性错误。我们讨论了这两种判别式与生成式聚类方法的根本差异。此外,我们揭示了标准熵聚类对窄间隔的敏感性,并提出了显式间隔最大化项。我们还提出了一种改进的自标注损失函数,该函数对伪标签错误具有鲁棒性,并强制执行更强的公平性。我们为该损失函数开发了一种EM算法,其速度显著快于标准替代方案。我们的结果在标准基准测试上提升了当前最优性能。