Partitioning a set of elements into subsets of a priori unknown sizes is essential in many applications. These subset sizes are rarely explicitly learned - be it the cluster sizes in clustering applications or the number of shared versus independent generative latent factors in weakly-supervised learning. Probability distributions over correct combinations of subset sizes are non-differentiable due to hard constraints, which prohibit gradient-based optimization. In this work, we propose the differentiable hypergeometric distribution. The hypergeometric distribution models the probability of different group sizes based on their relative importance. We introduce reparameterizable gradients to learn the importance between groups and highlight the advantage of explicitly learning the size of subsets in two typical applications: weakly-supervised learning and clustering. In both applications, we outperform previous approaches, which rely on suboptimal heuristics to model the unknown size of groups.
翻译:将一组元素划分为先验未知大小的子集在许多应用中至关重要。这些子集大小很少被显式学习——无论是聚类应用中的簇大小,还是弱监督学习中共享与独立生成潜在因子的数量。由于硬约束条件,正确子集大小组合的概率分布是非可微的,这阻碍了基于梯度的优化。在本工作中,我们提出了可微超几何分布。超几何分布根据组间相对重要性对不同分组大小的概率进行建模。我们引入了可重参数化梯度来学习组间重要性,并强调了在两种典型应用——弱监督学习和聚类中——显式学习子集大小的优势。在这两种应用中,我们均优于依赖次优启发式方法对未知分组大小进行建模的先前方法。