Many natural and social science systems are described using probability distributions over elements that are related to each other: for instance, occupations with shared skills or species with similar traits. Standard information theory quantities such as entropies and $f$-divergences treat elements interchangeably and are blind to the similarity structure. We introduce a family of divergences that are sensitive to the geometry of the underlying domain. By virtue of being the Bregman divergences of structure-aware entropies, they provide a framework that retains several advantages of Kullback-Leibler divergence and Shannon entropy. Structure-aware divergences recover planted patterns in a synthetic clustering task that conventional divergences miss and are orders of magnitude faster than optimal transport distances. We demonstrate their applicability in economic geography and ecology, where structure plays an important role. Modelling different notions of occupation relatedness yields qualitatively different regionalisations of their geographic distribution. Our methods also reproduce established insights into functional $β$-diversity in ecology obtained with optimal transport methods.
翻译:许多自然和社会科学系统通过描述相互关联元素上的概率分布来建模:例如,具有共同技能的职业或具有相似特征的各种物种。标准信息论量(如熵和$f$-散度)将元素视为可互换的,忽略了相似性结构。我们引入了一类对底层域几何结构敏感的散度。作为结构感知熵的布雷格曼散度,它们提供了一个框架,保留了库尔巴克-莱布勒散度和香农熵的若干优势。在合成聚类任务中,传统散度会遗漏的模式,而结构感知散度能重新检测出来,并且其速度比最优传输距离快数个数量级。我们展示了它们在结构起重要作用的《经济地理学》和《生态学》中的适用性。对职业关联性采用不同概念进行建模,会产生其地理分布在质上的不同区域划分。我们的方法还再现了生态学中通过最优传输方法获得的关于功能$\beta$多样性的既有见解。