We propose a new approach for fine-grained uncertainty quantification (UQ) using a collision matrix. For a classification problem involving $K$ classes, the $K\times K$ collision matrix $S$ measures the inherent (aleatoric) difficulty in distinguishing between each pair of classes. In contrast to existing UQ methods, the collision matrix gives a much more detailed picture of the difficulty of classification. We discuss several possible downstream applications of the collision matrix, establish its fundamental mathematical properties, as well as show its relationship with existing UQ methods, including the Bayes error rate. We also address the new problem of estimating the collision matrix using one-hot labeled data. We propose a series of innovative techniques to estimate $S$. First, we learn a contrastive binary classifier which takes two inputs and determines if they belong to the same class. We then show that this contrastive classifier (which is PAC learnable) can be used to reliably estimate the Gramian matrix of $S$, defined as $G=S^TS$. Finally, we show that under very mild assumptions, $G$ can be used to uniquely recover $S$, a new result on stochastic matrices which could be of independent interest. Experimental results are also presented to validate our methods on several datasets.
翻译:我们提出了一种利用碰撞矩阵进行细粒度不确定性量化(UQ)的新方法。对于一个包含 $K$ 个类别的分类问题,$K\times K$ 的碰撞矩阵 $S$ 衡量了区分每一对类别之间固有的(偶然的)困难程度。与现有的UQ方法相比,碰撞矩阵提供了关于分类难度的更为细致的图景。我们讨论了碰撞矩阵几种可能的下游应用,建立了其基本的数学性质,并展示了其与现有UQ方法(包括贝叶斯错误率)的关系。我们还探讨了使用独热编码标签数据估计碰撞矩阵这一新问题。我们提出了一系列创新技术来估计 $S$。首先,我们学习一个对比二元分类器,它接收两个输入并判断它们是否属于同一类别。然后我们证明,这个对比分类器(它是PAC可学习的)可用于可靠地估计 $S$ 的格拉姆矩阵,定义为 $G=S^TS$。最后,我们证明在非常温和的假设下,$G$ 可用于唯一地恢复 $S$,这是关于随机矩阵的一个新结果,可能具有独立的研究价值。我们还提供了在多个数据集上的实验结果以验证我们的方法。