The Vendi Score: A Diversity Evaluation Metric for Machine Learning

Diversity is an important criterion for many areas of machine learning (ML), including generative modeling and dataset curation. However, existing metrics for measuring diversity are often domain-specific and limited in flexibility. In this paper, we address the diversity evaluation problem by proposing the Vendi Score, which connects and extends ideas from ecology and quantum statistical mechanics to ML. The Vendi Score is defined as the exponential of the Shannon entropy of the eigenvalues of a similarity matrix. This matrix is induced by a user-defined similarity function applied to the sample to be evaluated for diversity. In taking a similarity function as input, the Vendi Score enables its user to specify any desired form of diversity. Importantly, unlike many existing metrics in ML, the Vendi Score does not require a reference dataset or distribution over samples or labels, it is therefore general and applicable to any generative model, decoding algorithm, and dataset from any domain where similarity can be defined. We showcase the Vendi Score on molecular generative modeling where we found it addresses shortcomings of the current diversity metric of choice in that domain. We also applied the Vendi Score to generative models of images and decoding algorithms of text where we found it confirms known results about diversity in those domains. Furthermore, we used the Vendi Score to measure mode collapse, a known shortcoming of generative adversarial networks (GANs). In particular, the Vendi Score revealed that even GANs that capture all the modes of a labeled dataset can be less diverse than the original dataset. Finally, the interpretability of the Vendi Score allowed us to diagnose several benchmark ML datasets for diversity, opening the door for diversity-informed data augmentation.

翻译：多样性是机器学习（ML）许多领域（包括生成建模和数据集整理）的重要评价标准。然而，现有的多样性度量指标通常局限于特定领域且灵活性有限。本文通过提出Vendi Score来解决多样性评估问题，该指标将生态学与量子统计力学的思想拓展至机器学习领域。Vendi Score定义为相似度矩阵特征值的香农熵的指数形式，该矩阵由用户定义的相似度函数作用于待评估多样性的样本生成。通过引入相似度函数作为输入，Vendi Score允许用户指定任意期望的多样性形式。重要的是，与ML中许多现有指标不同，Vendi Score不需要参考数据集或样本/标签的分布，因此具有通用性，适用于任何可定义相似度的领域中的生成模型、解码算法及数据集。我们在分子生成建模中展示了Vendi Score的应用，发现它弥补了该领域当前多样性指标存在的缺陷。我们还将Vendi Score应用于图像生成模型和文本解码算法，验证了这些领域关于多样性的已有结论。此外，我们使用Vendi Score衡量模式坍塌——生成对抗网络（GANs）的已知缺陷。特别地，Vendi Score揭示了即使能捕获标注数据集所有模态的GANs，其多样性也可能低于原始数据集。最后，Vendi Score的可解释性使我们能够诊断多个基准ML数据集的多样性问题，为基于多样性的数据增强开辟了新途径。