Partitioning a set of elements into subsets of a priori unknown sizes is essential in many applications. These subset sizes are rarely explicitly learned - be it the cluster sizes in clustering applications or the number of shared versus independent generative latent factors in weakly-supervised learning. Probability distributions over correct combinations of subset sizes are non-differentiable due to hard constraints, which prohibit gradient-based optimization. In this work, we propose the differentiable hypergeometric distribution. The hypergeometric distribution models the probability of different group sizes based on their relative importance. We introduce reparameterizable gradients to learn the importance between groups and highlight the advantage of explicitly learning the size of subsets in two typical applications: weakly-supervised learning and clustering. In both applications, we outperform previous approaches, which rely on suboptimal heuristics to model the unknown size of groups.
翻译:在许多应用中,将元素集合划分为大小未知的子集至关重要。这些子集大小很少被显式学习——无论是聚类应用中的聚类大小,还是弱监督学习中共享与独立生成潜在因子的数量。由于硬约束,正确子集大小组合的概率分布是不可微的,这阻碍了基于梯度的优化。在这项工作中,我们提出了可微分超几何分布。超几何分布基于组相对重要性建模不同组大小的概率。我们引入可重参数化梯度来学习组间重要性,并突出了在两个典型应用中显式学习子集大小的优势:弱监督学习和聚类。在这两个应用中,我们均优于依赖次优启发式方法建模未知组大小的先前方法。